Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nginx 502 Bad gateway. GMD 111: Connection refused) while connecting to upstream #70

Closed
alokispandey opened this issue Nov 4, 2023 · 19 comments

Comments

@alokispandey
Copy link

Hi,
Recently I managed to install GMD successfully using the installer . All services are active but still getting error" 502 BAD Gateway

Installation Closing message:

Galera Manager installation finished. Enter http://10.112.48.69 in a web browser to access. Please note, you chose to use an unencrypted http protocol, such connections are prone to several types of security issues. Always use only trusted networks when connecting to the service.
INFO[0077] Logs DB url: http://10.112.48.69:8081
IMPORTANT: ensure TCP ports 80, 8081 are open in firewall.
INFO[0077] Below you can see Logs DB credentials:
DB name: gmd
DB user: gmd
DB password: 9iKc2lXzdW
The installation log is located at /tmp/gm-installer.log

Service status : NGINX

● nginx.service - The nginx HTTP and reverse proxy server
Loaded: loaded (/usr/lib/systemd/system/nginx.service; disabled; vendor preset: disabled)
Active: active (running) since Sat 2023-11-04 06:03:11 GMT; 28min ago
Process: 1089519 ExecStart=/usr/sbin/nginx (code=exited, status=0/SUCCESS)
Process: 1089517 ExecStartPre=/usr/sbin/nginx -t (code=exited, status=0/SUCCESS)
Process: 1089511 ExecStartPre=/usr/bin/rm -f /run/nginx.pid (code=exited, status=0/SUCCESS)
Main PID: 1089520 (nginx)
Tasks: 9 (limit: 205278)
Memory: 11.6M
CGroup: /system.slice/nginx.service
├─1089520 nginx: master process /usr/sbin/nginx
├─1089521 nginx: worker process
├─1089522 nginx: worker process
├─1089523 nginx: worker process
├─1089524 nginx: worker process
├─1089525 nginx: worker process
├─1089526 nginx: worker process
├─1089527 nginx: worker process
└─1089528 nginx: worker process

Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl systemd[1]: Starting The nginx HTTP and reverse proxy server...
Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl nginx[1089517]: nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl nginx[1089517]: nginx: configuration file /etc/nginx/nginx.conf test is successf>
Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl systemd[1]: Started The nginx HTTP and reverse proxy server.

Service status : GMD

● gmd.service - gmd - galera manager daemon
Loaded: loaded (/usr/lib/systemd/system/gmd.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2023-11-04 06:31:25 GMT; 17s ago
Main PID: 1091659 (gmd)
Tasks: 12 (limit: 205278)
Memory: 20.8M
CGroup: /system.slice/gmd.service
└─1091659 /usr/bin/gmd run

Nov 04 06:31:25 wkflaspsitcdb01.idm.oam.mbnl systemd[1]: Started gmd - galera manager daemon.
Nov 04 06:31:25 wkflaspsitcdb01.idm.oam.mbnl gmd[1091659]: time="2023-11-04T06:31:25.755" level=info msg="Starting gmd" func=>
Nov 04 06:31:25 wkflaspsitcdb01.idm.oam.mbnl gmd[1091659]: time="2023-11-04T06:31:25.755" level=info msg="Listening on 127.0.>
Nov 04 06:31:25 wkflaspsitcdb01.idm.oam.mbnl gmd[1091659]: time="2023-11-04T06:31:25.755" level=info msg="ConfigDir = /var/li>
Nov 04 06:31:25 wkflaspsitcdb01.idm.oam.mbnl gmd[1091659]: time="2023-11-04T06:31:25.755" level=info msg="LogsDir = /var/log

Service status : influxd

influxdb.service - InfluxDB is an open-source, distributed, time series database
Loaded: loaded (/usr/lib/systemd/system/influxdb.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2023-11-04 06:03:11 GMT; 30min ago
Docs: https://docs.influxdata.com/influxdb/
Process: 1089450 ExecStart=/usr/lib/influxdb/scripts/influxd-systemd-start.sh (code=exited, status=0/SUCCESS)
Main PID: 1089455 (influxd)
Tasks: 13 (limit: 205278)
Memory: 58.0M
CGroup: /system.slice/influxdb.service
└─1089455 /usr/bin/influxd

Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl systemd[1]: Starting InfluxDB is an open-source, distributed, time series databa>
Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl influxd-systemd-start.sh[1089456]: Command "print-config" is deprecated, use the>
Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl influxd-systemd-start.sh[1089479]: Command "print-config" is deprecated, use the>
Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl influxd-systemd-start.sh[1089493]: Command "print-config" is deprecated, use the>
Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl influxd-systemd-start.sh[1089450]: InfluxDB started
Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl systemd[1]: Started InfluxDB is an open-source, distributed, time series databas>
lines 1-17/17 (END)

LOGS

[]# tail -f /var/log/gmd/default.log
{"channel-type":"app","file":"/go/pkg/cmd/run.go:64","func":"github.com/codership/galera-manager/pkg/cmd.(*RunCommand).Execute","level":"info","msg":"ConfigDir = /var/lib/gmd","time":"2023-11-04T06:24:22Z"}
{"channel-type":"app","file":"/go/pkg/cmd/run.go:65","func":"github.com/codership/galera-manager/pkg/cmd.(*RunCommand).Execute","level":"info","msg":"LogsDir = /var/log/gmd","time":"2023-11-04T06:24:22Z"}
{"channel-type":"app","file":"/go/pkg/cmd/run.go:60","func":"github.com/codership/galera-manager/pkg/cmd.(*RunCommand).Execute","level":"info","msg":"Starting gmd","time":"2023-11-04T06:24:52Z"}
{"channel-type":"app","file":"/go/pkg/cmd/run.go:61","func":"github.com/codership/galera-manager/pkg/cmd.(*RunCommand).Execute","level":"info","msg":"Listening on 127.0.0.1:8000","time":"2023-11-04T06:24:52Z"}

~]# ll /var/log/influxdb/
total 0

~]#tail -f /var/log/nginx/10.112.48.69.gmd.error.log
2023/11/04 06:13:54 [error] 1089521#0: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 10.112.48.69, server: 10.112.48.69, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8000/", host: "10.112.48.69"

Setup & ENV

  • Trying to install on Redhat 8 .
  • Domain needs to be whitelisted on HTTP-Proxy to access the internet.
  • Domain whitelisted "*.influxdata.com, *.galera-manager.com, *.fedoraproject.org"

Questions

  1. What all domains other than already listed ( list of domain whitelisted are mentioned above) needs to be whitelisted to GMD to install properly?. Do GMD requries access to to github.com/codership/galera-manager/pkg/cmd to function ?
    1. Looking after the GMD logs it seems GMD needs access to "github.com/codership/galera-manager/pkg/cmd.(*RunCommand)" but "https://github.com/codership/galera-manager/pkg/" returns 404-page not found on the browser. Is this a bug?
  2. Do let me know if you need more details from my side to assist, please.
@esscz
Copy link

esscz commented Nov 6, 2023

Hello,

It appears that NGINX is encountering difficulties connecting to GMD on port 8000. Could you kindly verify if GMD is accessible on this port? A simple way to check is by using the telnet command:

telnet 127.0.0.1 8000

Thank you for your cooperation.

@alokispandey
Copy link
Author

alokispandey commented Nov 7, 2023 via email

@esscz
Copy link

esscz commented Nov 7, 2023

Could you please verify that gmd is actively listening on port 8000 ? You can do this by executing either sudo ss -tulnp | grep 8000 or sudo netstat -tulnp | grep 8000

@esscz
Copy link

esscz commented Nov 7, 2023

It's also possible that SELinux is preventing the connection. Could you please execute the following commands to check the status of SELinux and to search for any relevant AVC denials?

sestatus

and

sudo ausearch -m avc -ts recent

@alokispandey
Copy link
Author

GMD Is not listening. Not sure how come service showing active.
Nothing is logs even after enabling debug mode.

[root@nhtlaspsitcdb01 ~]# sestatus
SELinux status: disabled
[root@nhtlaspsitcdb01 ~]# ausearch -m avc -ts recent
Email option is specified but /usr/lib/sendmail doesn't seem executable.
q_depth should be larger than 512 for safety margin

I tried running gmd service manually using below command:
[root@nhtlaspsitcdb01 ~]# /usr/bin/gmd --config-dir=/var/lib/gmd --logs-dir=/var/log/gmdv --log-format=json --log-level=debug run --bind-address=127.0.0.1:8000 --influxdb-url=http://gmd:ofrjVFL9XY@nhtlaspsitcdb01.idm.oam.mbnl:8081/gmd
{"file":"/go/pkg/cmd/run.go:60","func":"github.com/codership/galera-manager/pkg/cmd.(*RunCommand).Execute","level":"info","msg":"Starting gmd","time":"2023-11-07T08:14:47Z"}
{"file":"/go/pkg/cmd/run.go:61","func":"github.com/codership/galera-manager/pkg/cmd.(*RunCommand).Execute","level":"info","msg":"Listening on 127.0.0.1:8000","time":"2023-11-07T08:14:47Z"}
{"file":"/go/pkg/cmd/run.go:64","func":"github.com/codership/galera-manager/pkg/cmd.(*RunCommand).Execute","level":"info","msg":"ConfigDir = /var/lib/gmd","time":"2023-11-07T08:14:47Z"}
{"file":"/go/pkg/cmd/run.go:65","func":"github.com/codership/galera-manager/pkg/cmd.(*RunCommand).Execute","level":"info","msg":"LogsDir = /var/log/gmdv","time":"2023-11-07T08:14:47Z"}
Get "http://checkip.amazonaws.com": dial tcp: lookup checkip.amazonaws.com: i/o timeout
[root@nhtlaspsitcdb01 ~]#

I can see a timeout, could it be a reason? if yes, then how to set proxy for GMD as our internet access is behind a proxy and setting up OS level proxy will break functionality of other running application.

@alokispandey
Copy link
Author

Thanks its fixed. issue seems to be http_proxy was not set. I access my proxy details in

cat /usr/lib/systemd/system/gmd.service

[Unit]
Description=gmd - galera manager daemon
After=network.target

[Service]
EnvironmentFile=/etc/default/gmd
[Service]
Environment=https_proxy=http://XXXXXXX:3128
Environment=http_proxy=http://XXXXXl:3128
User=gmd
Group=gmd
LimitNOFILE=65536
Restart=on-failure
Type=simple
ExecStart=/usr/bin/gmd run $ARGS

[Install]
WantedBy=default.target

]# restart gmd service and it started working.

@alokispandey
Copy link
Author

now getting error while adding node in cluster:
Error

failed to execute cluster config script (RunScriptWithConn)

Process exited with status 4

@esscz
Copy link

esscz commented Nov 7, 2023

Could you please attach whole log of installation ? The reason will be mentioned several lines before the end of file.

@alokispandey
Copy link
Author

alokispandey commented Nov 7, 2023

Attached:
cluster-8-host-2.log

i am trying to monitor the existing Galera-cluster and;
1: The steps to setup a repo in https://galeracluster.com/2021/02/using-galera-manager-to-monitor-your-existing-galera-clusters/ is not working.
dnf install galera-4 reports error:

Error: Transaction test error:
file /etc/sysconfig/garb from install of galera-4-26.4.16-1.el8.x86_64 conflicts with file from package galera-25.3.35-1.module+el8.6.0+15949+4ba4ec26.x86_64
file /usr/share/man/man8/garbd.8.gz from install of galera-4-26.4.16-1.el8.x86_64 conflicts with file from package galera-25.3.35-1.module+el8.6.0+15949+4ba4ec26.x86_64
2. Repo usages centos, i tried to replace it with redhat, but still no luck.

@alokispandey
Copy link
Author

I tried installing the "managed cluster" on a fresh node. still no luck. Attaching full logs
cluster-9-host-1.log
cluster-9.log

@esscz
Copy link

esscz commented Nov 8, 2023

In both cases it's not possible to resolve hostname:

{"channel-type":"stdout","cluster-id":"9","file":"/go/pkg/log/iolog.go:37","func":"github.com/codership/galera-manager/pkg/log.(*IOLog).Write","host-id":"1","level":"info","msg":"failed: Name or service not known.\r\nwget: unable to resolve host address ‘downloads.mariadb.com’\r\n","ssh-host":"10.102.48.39","time":"2023-11-08T06:14:29Z"}

and

{"channel-type":"stdout","cluster-id":"8","file":"/go/pkg/log/iolog.go:37","func":"github.com/codership/galera-manager/pkg/log.(*IOLog).Write","host-id":"2","level":"info","msg":"Errors during downloading metadata for repository 'galera-manager':\r\n  - Curl error (6): Couldn't resolve host name for https://repo.galera-manager.com/nexus/repository/galera-manager-release/repodata/repomd.xml [Could not resolve host: repo.galera-manager.com]\r\n","ssh-host":"10.102.48.38","time":"2023-11-07T11:26:38Z"}

could you please check DNS settings ?

@alokispandey
Copy link
Author

The system is behind proxy, the installer must have an option to set proxy otherwise must use ENV variable http_proxy.
I tested it by exporting http_proxy = "http://my.proxy.com:myport" and tried to curl that URL, its working fine.

@alokispandey
Copy link
Author

I managed to pass proxy to galera installer by exporting http_proxy variable in /etc/bashrc file. And I can see installer was able to make progress but it failing now on:

root@10.112.48.70# mysqladmin -u root status
Nov 08, 2023 14:42:06 | stdout | �mysqladmin: connect to server at 'localhost' failed
Nov 08, 2023 14:42:06 | stdout | error: 'Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)'
Nov 08, 2023 14:42:06 | stdout | Check that mysqld is running and that the socket: '/var/lib/mysql/mysql.sock' exists!
Nov 08, 2023 14:42:06 | galera-manager | mysqld is apparently not running
Nov 08, 2023 14:42:06 | galera-manager | setting cluster-wide Lock (to avoid race conditions with the first node)
Nov 08, 2023 14:42:06 | galera-manager | starting as a first node
Nov 08, 2023 14:42:06 | galera-manager | checking grastate
Nov 08, 2023 14:42:06 | galera-manager | running start script
Nov 08, 2023 14:42:06 | galera-manager | root@10.112.48.70# echo -n "test"
Nov 08, 2023 14:42:06 | stdout | test
Nov 08, 2023 14:42:06 | galera-manager | Default Galera version is 4
Nov 08, 2023 14:42:06 | galera-manager | Including custom config directory from my.cnf
Nov 08, 2023 14:42:06 | galera-manager | Writing to /etc/mysql/wsrep/conf.d/99.galera.cnf
Nov 08, 2023 14:42:06 | stdout | 10.112.48.70:22$ bash -c '[ -f /var/lib/mysql/grastate.dat ] && sed -i '"'"'s/safe_to_bootstrap: .*/safe_to_bootstrap: 1/'"'"' /var/lib/mysql/grastate.dat || true'
Nov 08, 2023 14:42:06 | galera-manager | Will fix grastate.dat (if required)
Nov 08, 2023 14:42:06 | galera-manager | Running the first node in the cluster
Nov 08, 2023 14:42:06 | stdout | 10.112.48.70:22$ galera_new_cluster
Nov 08, 2023 14:42:07 | stdout | Job for mariadb.service failed because the control process exited with error code.
Nov 08, 2023 14:42:07 | stdout | See "systemctl status mariadb.service" and "journalctl -xe" for details.
Nov 08, 2023 14:42:07 | galera-manager | Got an error and attepts = 0

FUll console logs:
galera-console output.txt

Host logs:
cluster-12-host-2.log

Hope the above logs may provide you more insight to what is going wrong.

@alokispandey
Copy link
Author

tried on fresh redhat 8 node with mysql-8 instead of mariadb. still failing on same point.

, 2023 13:52:46 | galera-manager | checking node status
Nov 09, 2023 13:52:46 | galera-manager | root@10.112.48.70# mysqladmin -u root status
Nov 09, 2023 13:52:46 | stdout | �mysqladmin: connect to server at 'localhost' failed
Nov 09, 2023 13:52:46 | stdout | error: 'Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (111)'
Nov 09, 2023 13:52:46 | stdout | Check that mysqld is running and that the socket: '/var/lib/mysql/mysql.sock' exists!
Nov 09, 2023 13:52:46 | galera-manager | mysqld is apparently not running
Nov 09, 2023 13:52:46 | galera-manager | setting cluster-wide Lock (to avoid race conditions with the first node)
Nov 09, 2023 13:52:46 | galera-manager | starting as a first node
Nov 09, 2023 13:52:46 | galera-manager | checking grastate
Nov 09, 2023 13:52:46 | galera-manager | running start script
Nov 09, 2023 13:52:46 | galera-manager | root@10.112.48.70# echo -n "test"
Nov 09, 2023 13:52:46 | stdout | test
Nov 09, 2023 13:52:46 | galera-manager | Default Galera version is 4
Nov 09, 2023 13:52:46 | galera-manager | Including custom config directory from my.cnf
Nov 09, 2023 13:52:46 | galera-manager | Writing to /etc/mysql/wsrep/conf.d/99.galera.cnf
Nov 09, 2023 13:52:46 | galera-manager | Will fix grastate.dat (if required)
Nov 09, 2023 13:52:46 | stdout | 10.112.48.70:22$ bash -c '[ -f /var/lib/mysql/grastate.dat ] && sed -i '"'"'s/safe_to_bootstrap: .*/safe_to_bootstrap: 1/'"'"' /var/lib/mysql/grastate.dat || true'
Nov 09, 2023 13:52:46 | galera-manager | Running the first node in the cluster
Nov 09, 2023 13:52:46 | stdout | 10.112.48.70:22$ bash -c 'systemctl set-environment MYSQLD_OPTS="--wsrep-new-cluster" && systemctl start mysqld && systemctl unset-environment MYSQLD_OPTS'
Nov 09, 2023 13:52:48 | stdout | Job for mysqld.service failed because the control process exited with error code.
Nov 09, 2023 13:52:48 | stdout | See "systemctl status mysqld.service" and "journalctl -xe" for details.
Nov 09, 2023 13:52:48 | galera-manager | Got execution failure, will retry (attempts left 2)
Nov 09, 2023 13:52:48 | galera-manager | Got an error and attepts = 3
Nov 09, 2023 13:52:53 | stdout | 10.112.48.70:22$ bash -c 'systemctl set-environment MYSQLD_OPTS="--wsrep-new-cluster" && systemctl start mysqld && systemctl unset-environment MYSQLD_OPTS'
Nov 09, 2023 13:52:55 | stdout | Job for mysqld.service failed because the control process exited with error code.
Nov 09, 2023 13:52:55 | stdout | See "systemctl status mysqld.service" and "journalctl -xe" for details.
Nov 09, 2023 13:52:55 | galera-manager | Got an error and attepts = 2
Nov 09, 2023 13:52:55 | galera-manager | Got execution failure, will retry (attempts left 1)
Nov 09, 2023 13:53:00 | stdout | 10.112.48.70:22$ bash -c 'systemctl set-environment MYSQLD_OPTS="--wsrep-new-cluster" && systemctl start mysqld && systemctl unset-environment MYSQLD_OPTS'
Nov 09, 2023 13:53:03 | stdout | Job for mysqld.service failed because the control process exited with error code.
Nov 09, 2023 13:53:03 | stdout | See "systemctl status mysqld.service" and "journalctl -xe" for details.
Nov 09, 2023 13:53:03 | galera-manager | Got an error and attepts = 1
Nov 09, 2023 13:53:03 | galera-manager | SshHost.RunScript error: command failed (stepName=run_cluster_first, commandId=3, commandType=ExecCommand): Process exited with status 1failed to execute cluster config script (RunScriptWithConn)
github.com/codership/galera-manager/pkg/internal/sshcmd.(*Host).RunScriptWithConn
/go/pkg/internal/sshcmd/executor.go:115
github.com/codership/galera-manager/pkg/internal/sshcmd.(*Host).RunScript
/go/pkg/internal/sshcmd/executor.go:171
github.com/codership/galera-manager/pkg/internal/mgmt/units.(*Node).Start
/go/pkg/internal/mgmt/units/node.go:483
github.com/codership/galera-manager/pkg/internal/mgmt.(*Nodes).Start.func1
/go/pkg/internal/mgmt/nodes.go:180
github.com/codership/galera-manager/pkg/internal/jobs.(*Processor).Execute.func1
/go/pkg/internal/jobs/processor.go:90
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1594
Nov 09, 2023 13:53:03 | galera-manager | Exit status is not 0. Database engine start failure?
Nov 09, 2023 13:53:03 | galera-manager | error starting the node

@ayurchen
Copy link
Member

ayurchen commented Nov 9, 2023

Hi.

The key is this line:

 See "systemctl status mysqld.service" and "journalctl -xe" for details.

That means that mysqld failed to start for its own internal reasons (most likely misconfiguration) and we must see its error log (the output of the aforementioned commands may br helpful, but limited)

Please post mysqld error log (normally /var/lib/mysql/mysqld.err or /var/log/mysql/error.log)

@alokispandey
Copy link
Author

HI alexey,
The problem is when deploying node with galere-manager , db service is not configured properly. Installer searching for pkg which is not part of its repo.
All matches were filtered out by modular filtering for argument: �[1mgalera�(B�[m
Nov 10, 2023 04:45:44 | stdout | Package socat-1.7.4.1-1.el8.x86_64 is already installed.
Nov 10, 2023 04:45:44 | stdout | Error: Unable to find a match: galera

Available repos:
[root@nhtlaspsitcdb03 ~]# dnf repolist
Updating Subscription Management repositories.
repo id repo name
ansible-2.8-for-rhel-8-x86_64-rpms Red Hat Ansible Engine 2.8 for RHEL 8 x86_64 (RPMs)
ansible-2.9-for-rhel-8-x86_64-rpms Red Hat Ansible Engine 2.9 for RHEL 8 x86_64 (RPMs)
codeready-builder-for-rhel-8-x86_64-rpms Red Hat CodeReady Linux Builder for RHEL 8 x86_64 (RPMs)
epel Extra Packages for Enterprise Linux 8 - x86_64
galera-manager Galera Manager
influxdb InfluxDB Repository - RHEL 8
rhel-8-for-x86_64-appstream-rpms Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs)
rhel-8-for-x86_64-baseos-rpms Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs)
rhel-8-for-x86_64-supplementary-rpms Red Hat Enterprise Linux 8 for x86_64 - Supplementary (RPMs)
[root@nhtlaspsitcdb03 ~]#

@byte
Copy link
Contributor

byte commented Nov 20, 2023

HI alexey, The problem is when deploying node with galere-manager , db service is not configured properly. Installer searching for pkg which is not part of its repo. All matches were filtered out by modular filtering for argument: �[1mgalera�(B�[m Nov 10, 2023 04:45:44 | stdout | Package socat-1.7.4.1-1.el8.x86_64 is already installed. Nov 10, 2023 04:45:44 | stdout | Error: Unable to find a match: galera

Available repos: [root@nhtlaspsitcdb03 ~]# dnf repolist Updating Subscription Management repositories. repo id repo name ansible-2.8-for-rhel-8-x86_64-rpms Red Hat Ansible Engine 2.8 for RHEL 8 x86_64 (RPMs) ansible-2.9-for-rhel-8-x86_64-rpms Red Hat Ansible Engine 2.9 for RHEL 8 x86_64 (RPMs) codeready-builder-for-rhel-8-x86_64-rpms Red Hat CodeReady Linux Builder for RHEL 8 x86_64 (RPMs) epel Extra Packages for Enterprise Linux 8 - x86_64 galera-manager Galera Manager influxdb InfluxDB Repository - RHEL 8 rhel-8-for-x86_64-appstream-rpms Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs) rhel-8-for-x86_64-baseos-rpms Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs) rhel-8-for-x86_64-supplementary-rpms Red Hat Enterprise Linux 8 for x86_64 - Supplementary (RPMs) [root@nhtlaspsitcdb03 ~]#

this suggests you're missing the repository for galera, despite the galera-manager repo being there, it suggests you're not seeing galera server. can you paste contents of the galera-manager repo and for good measure influxdb repo and ensure that everything it calls can be accessed thru your firewall?

@alokispandey
Copy link
Author

alokispandey commented Nov 21, 2023

Hi,

I've made a fresh start on 3 vanilla redhat 8 nodes. Galera installation was successful but adding nodes is still failing

Deployment logs from Galera console:

Nov 21, 2023 10:53:01 | galera-manager | Default Galera version is 4
Nov 21, 2023 10:53:01 | galera-manager | Including custom config directory from my.cnf
Nov 21, 2023 10:53:01 | stdout         | 10.102.48.39:22$ bash -c '[ -f /var/lib/mysql/grastate.dat ] && sed -i '"'"'s/safe_to_bootstrap: .*/safe_to_bootstrap: 1/'"'"' /var/lib/mysql/grastate.dat || true'
Nov 21, 2023 10:53:01 | galera-manager | Will fix grastate.dat (if required)
Nov 21, 2023 10:53:01 | galera-manager | Running the first node in the cluster
Nov 21, 2023 10:53:01 | stdout         | 10.102.48.39:22$ galera_new_cluster
Nov 21, 2023 10:53:01 | stdout         | Job for mariadb.service failed because the control process exited with error code.
Nov 21, 2023 10:53:01 | stdout         | See "systemctl status mariadb.service" and "journalctl -xe" for details.
Nov 21, 2023 10:53:01 | galera-manager | Got an error and attepts = 0
Nov 21, 2023 10:53:01 | galera-manager | SshHost.RunScript error: command failed (stepName=run_cluster_first, commandId=3, commandType=ExecCommand): Process exited with status 1failed to execute cluster config script (RunScriptWithConn)
github.com/codership/galera-manager/pkg/internal/sshcmd.(*Host).RunScriptWithConn
	/go/pkg/internal/sshcmd/executor.go:115
github.com/codership/galera-manager/pkg/internal/sshcmd.(*Host).RunScript
	/go/pkg/internal/sshcmd/executor.go:171
github.com/codership/galera-manager/pkg/internal/mgmt/units.(*Node).Start
	/go/pkg/internal/mgmt/units/node.go:483
github.com/codership/galera-manager/pkg/internal/mgmt.(*Nodes).Start.func1
	/go/pkg/internal/mgmt/nodes.go:180
github.com/codership/galera-manager/pkg/internal/jobs.(*Processor).Execute.func1
	/go/pkg/internal/jobs/processor.go:90
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1594
Nov 21, 2023 10:53:01 | galera-manager | Exit status is not 0. Database engine start failure?
Nov 21, 2023 10:53:01 | galera-manager | error starting the node


Service status mariadb shows permission issue with telegraf


-21T10:54:17Z I! [inputs.execd] Starting process: /usr/local/bin/mysql_wsrep [-config /etc/telegraf/mysql_wsrep-telegraf-plugin.conf]
-21T10:54:17Z E! [inputs.execd] stderr: "Err loading input: open /etc/telegraf/mysql_wsrep-telegraf-plugin.conf: permission denied"
-21T10:54:17Z E! [inputs.execd] Process /usr/local/bin/mysql_wsrep exited: exit status 1
-21T10:54:17Z I! [inputs.execd] Restarting in 10s...


changed ownerhip # chown -R telegraf: /etc/telegraf and systemctl restart mariadb.service, but its still failing :
systemctl status mariadb.service

● mariadb.service - MariaDB 10.6.16 database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/mariadb.service.d
           └─migrated-from-my.cnf-settings.conf
   Active: failed (Result: exit-code) since Tue 2023-11-21 10:55:54 GMT; 34s ago
     Docs: man:mariadbd(8)
           https://mariadb.com/kb/en/library/systemd/
  Process: 7967 ExecStart=/usr/sbin/mariadbd $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=1/FAILURE)
  Process: 7838 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; >
  Process: 7836 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
 Main PID: 7967 (code=exited, status=1/FAILURE)

Nov 21 10:55:53 nhtlaspsitcdb03.idm.oam.mbnl systemd[1]: Starting MariaDB 10.6.16 database server...
Nov 21 10:55:54 nhtlaspsitcdb03.idm.oam.mbnl sh[7840]: WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
Nov 21 10:55:54 nhtlaspsitcdb03.idm.oam.mbnl mariadbd[7967]: [99B blob data]
Nov 21 10:55:54 nhtlaspsitcdb03.idm.oam.mbnl mariadbd[7967]: Fatal error in defaults handling. Program aborted
Nov 21 10:55:54 nhtlaspsitcdb03.idm.oam.mbnl systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE
Nov 21 10:55:54 nhtlaspsitcdb03.idm.oam.mbnl systemd[1]: mariadb.service: Failed with result 'exit-code'.
Nov 21 10:55:54 nhtlaspsitcdb03.idm.oam.mbnl systemd[1]: Failed to start MariaDB 10.6.16 database server.

mariadb.serice file is blank:

 cat /etc/systemd/system/mariadb.service.d/migrated-from-my.cnf-settings.conf
# converted using /usr/bin/mariadb-service-convert
#

[Service]



It is a broken installer i believe?

Sharing Repository details as requested:

Galera clinet:


[root@nhtlaspsitcdb03 ~]# cat /etc/yum.repos.d/
galera-manager.repo  influxdb.repo        mariadb.repo         redhat.repo
[root@nhtlaspsitcdb03 ~]# cat /etc/yum.repos.d/{g,i,m}*.repo
[galera-manager]
name = Galera Manager
baseurl = https://repo.galera-manager.com/nexus/repository/galera-manager-release
gpgcheck = 0
[influxdb]
name = InfluxDB Repository
baseurl = https://repos.influxdata.com/rhel/8/x86_64/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdata-archive_compat.key

[mariadb-main]
name = MariaDB Server
baseurl = https://dlm.mariadb.com/repo/mariadb-server/10.6/yum/rhel/8/x86_64
gpgkey = file:///etc/pki/rpm-gpg/MariaDB-Server-GPG-KEY
gpgcheck = 1
enabled = 1
module_hotfixes = 1


Galera Server:

[root@~]# ls /etc/yum.repos.d/
epel-modular.repo  epel.repo  epel-testing-modular.repo  epel-testing.repo  galera-manager.repo  influxdb.repo  redhat.repo
[root@nhtlaspsitcdb04 ~]# cat /etc/yum.repos.d/{g,i}*.repo
[galera-manager]
name = Galera Manager
baseurl = https://repo.galera-manager.com/nexus/repository/galera-manager-release
gpgcheck = 0
[influxdb]
name = InfluxDB Repository - RHEL $releasever
baseurl = https://repos.influxdata.com/rhel/8/$basearch/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdata-archive_compat.key


@alokispandey
Copy link
Author

alokispandey commented Dec 5, 2023

Finally, it works

https://galeracluster.com/2023/11/galera-manager-november-2023-release-now-includes-deployment-and-monitoring-for-percona-xtradb-cluster-pxc-8-0/
"gm-installer version 1.11.0" finally works. However it was not a one-click deployment and I had to make some amendments. i am sharing details below which someone might find useful.

  • Ensure to use vanilla VM, with no per-configuration.
  • If you are running behind a proxy, ensure the below domains are whitelisted.
a.	.influxdata.com
b.	.galeracluster.com
c.	. galera-manager.com
d.	.mariadb.com
e.	.github.com
  • Set firewall in advance as GM fails to set it up properly.
    `firewall-cmd --add-port={3306/tcp,4567/tcp,4567/udp,4568/tcp,4444/tcp} --permanent; firewall-cmd --reload ; firewall-cmd --list-all

`### Galera-manager installer issues

fail to manage HTTP proxy
Most of the system runs behind a proxy and the gm-installer fails to use the proxy defined after installation, GMD service fails to start as it needs access to codership repository to work. Ensure systemctl serivce file has proxy defined using the "Environment" parameter as shown below:

cat /usr/lib/systemd/system/gmd.service
[Unit]
Description=gmd - galera manager daemon
After=network.target

[Service]
EnvironmentFile=/etc/default/gmd
User=gmd
Group=gmd
LimitNOFILE=65536
Restart=on-failure
Type=simple
ExecStart=/usr/bin/gmd run $ARGS
Environment=https_proxy=http://my.env.proxy.com:proxyport
Environment=http_proxy=http://my.env.proxy.com:proxyport

[Install]
WantedBy=default.target

Galera-manager deployment issues

GM installer fails to use HTTP proxy defined in /etc/profile or exported in http_proxy variable

If you are performing deployment on nodes behind a proxy, ensure to export http_proxy and https_proxy in /etc/bashrc for the GM installer to use it and perform the deployment. You can perform clean-up after deployment and define a proxy in dnf repo files.

Galera-manager after deployment issues

1. GM installer fail to start services

Once GM is installed, and while performing db deployments, GM fails to start the service at the final step. Ensure below

  • /etc/telegraph is owned by user telegraph
  • /etc/mysql is owned by user mysql
  • after your ensure ownership, simply restart the service manually.

2. Other Errors

  • [ERROR] Incorrect definition of table mysql.column_stats: expected column 'hist_type' at position 9 to have type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB','JSON_HB'), found type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB').

To fix it use "alter table" :

eg: 
•	ALTER TABLE mysql.column_stats MODIFY hist_type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB','JSON_HB');
•	ALTER TABLE mysql.column_stats MODIFY histogram longblob;

  • Telegrapf Error: Unknown column 'status' in 'where clause'

To fix it follow influxdata/telegraf#7968

best of luck..!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants