Skip to content

Conversation

@sanjaysrikakulam
Copy link
Member

@sanjaysrikakulam sanjaysrikakulam commented Jul 18, 2025

See issue: https://github.com/usegalaxy-eu/issues/issues/754
Xref:

  1. ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /opt/miniconda/envs/_galaxy_/lib/python3.11/lib-dynload/../.././libicuuc.so.75) galaxyproject/ansible-galaxy#228
  2. Conda accept ToS, see issue (ToDo) Note: Since I have manually deployed the playbook I was able to add an Ansible task to the role and accept the ToS as in the linked issue and therefore this should not fail but this needs to be fixed upstream and the role version should be bumped later. We will likely revisit this during the dual-head node setup.
  3. DNS: Create DNS entries for the new infra 4-in-1 nodes. infrastructure#239
  4. TIaaS2 role error (solution: add tiaas_galaxy_stylesheet: '{{ galaxy_server_dir }}/static/dist/base.css' to the group_vars see here for more details)
TASK [galaxyproject.tiaas2 : Copy Galaxy's stylesheet] ***************************************************************************************************************
fatal: [sn09.galaxyproject.eu]: FAILED! => changed=false 
  msg: Source /opt/galaxy/server/static/style/base.css not found
  1. Have issues installing the requirements.yaml (ToDo)
  - name: linuxhq.yum_cron
    version: master

This does not appear to exist anymore on Ansible Galaxy, and the role is still used by a few playbooks. Currently, I have commented out this role in my manual deployment process, but when the CI is set up, we will likely encounter issues and the CI will fail with an error stating that this role is not available. We should investigate what this role is doing and find alternatives. -- Update: This is a local role, so I have removed it from the requirements file and added a commit to this PR.
6. galaxyproject/ansible-galaxy#229 -- Update: Used the latest commit hash from this repo in the requirements file.
7. Updated NodeJS version gie_proxy_nodejs_version: '22.0.0'. On sn06, this appears to be locally/manually updated to enable the gie-proxy to work with the PostgreSQL DB and it was updated to 21.1.0 but this did not reflect on the sn06 group vars file, unfortunately.
8. Updated the usegalaxy_eu.tpv_auto_lint role to 0.4.5 due to this issue which I had to find in a hard way.
9. #1585
10. usegalaxy-eu/infrastructure#245
11. I manually installed flower==2.0.1 for celery on galaxy' venv on sn09. (This is likely not needed if and when the celery CI runs and the nfs-sync cron job; but just in case to avoid any surprises during the migration)

Before deployment/migration:

  • Check nginx files and see if something on the disk is being referenced (alias) (e.g., /opt/phinch has a dedicated route on galaxy-main nginx conf; this needs to be rsynced as well)
  • Stop all CI jobs
  • Pre rsync:
    • /opt/galaxy/mutable-data (except the socket files)
    • /opt/galaxy/mutable-config
    • /opt/galaxy/shed_tools
    • /opt/galaxy/tool-data
    • /opt/phinch

After deployment/migration:

  • From @bgruening (see handler and workflow handler migration steps below)
  • Enable roles during final deployment: usegalaxy-eu.rsync-to-nfs (only once we know everything is working fine do this)
  • Confirm that both sn06 and sn09 has the same crontab entries (galaxy, and root user accounts). I already checked it and currently I have commented out the ones on galaxy user account. So, enable them if they are still uncommented after the final deployment.
  • Increase Gunicorns, job handler and workflow schedulers count (set it to what sn06 has)
  • Job handler migration:
    • gxadmin mutate reassign-job-to-handler (Note, we need to populate the job ids for that; look for new, queued, and running jobs and get those IDs and loop it with this gxadmin command)
  • Workflow scheduler handler migration: (workflow handlers ids's from the Galaxy DB: workflow_scheduler_sn06_0, workflow_scheduler_sn06_1, workflow_scheduler_sn06_2; we have 3 systemd services, so 3 handlers)
    • mutate reassign-workflows-to-handler
    • mutate reassign-active-workflows-to-handler
  • Final full rsync:
    • /opt/galaxy/mutable-data (except the socket files)
    • /opt/galaxy/mutable-config
    • /opt/galaxy/shed_tools
    • /opt/galaxy/tool-data
    • /opt/phinch
    • /home/stats
  • Update everywhere sn06 is referred to (check the playbook repo, infrastructure repo, and operations)
  • Update sn05 to sn11 everwyhere (playbook repo, infrastructure repo, and operations) (Xref: Update sn05 to sn11 hostname in assorted places #1586)
  • Update any CI jobs to run on sn09 (tool installation and maybe there are other CI jobs)
  • Run the nfs sync job from sn09 and only then start the celery node and its services (the celery node has mounted the synced data from sn06, so I think if we want them all properly working along with its SQLite, we should do this (check with @bgruening)).
  • CI jobs:
    • Enable all the disabled projects
    • Deploy all the projects which was using sn05, so they all talk to the new DB server on sn11.
  • Merge traefik PR and deploy its CI.
  • Merge this PR in the infrastructure repo.
  • Reenable the cron tasks on the maintenance node and the crond service as well.
  • Remove the broadcast notification and post a new one. Similarly, update Wolfgang's post on the Galaxy Help forum.

@sanjaysrikakulam sanjaysrikakulam marked this pull request as draft July 18, 2025 10:02
@sanjaysrikakulam
Copy link
Member Author

sanjaysrikakulam commented Jul 24, 2025

Rsync dry run tests:

Commands:

rsync -auvri --size-only --dry-run --exclude node_modules --exclude .git --exclude __pycache__ -e "ssh -i ~/.ssh/cloud2_bw.key" /opt/galaxy/gie-proxy/ root@10.4.68.201:/opt/galaxy/gie-proxy/

change the source and destination directories; this is just an example.

Diff test for a few files:

diff galaxy.yml <(ssh -i ~/.ssh/cloud2_bw.key root@10.4.68.201 "cat /opt/galaxy/config/galaxy.yml")

Rsync test results:

  1. /opt/galaxy/config:
  2. markdown_export.css: on sn06, but not in the playbook repo and is not active on the galaxy.yml (see below).

on gxconfig.yml

```YAML
  # CSS file to apply to all Markdown exports to PDF - currently used by
  # WeasyPrint during rendering an HTML export of the document to PDF.
  # The value of this option will be resolved with respect to
  # <config_dir>.
  #markdown_export_css: markdown_export.css
```
  1. /opt/galaxy/custom-tools:
  • Likely we need to do a complete rysnc for custom-tools dir as there are a lot of files and sub dirs on sn06.
  1. /opt/galaxy/datasets:
  • An empty directory on sn06. So no action need to be taken. (However, the dir itself been created on sn09)
  1. /opt/galaxy/gie-proxy (rsync -auvri --size-only --dry-run --exclude node_modules --exclude .git --exclude __pycache__ --exclude node* --exclude python3* -e "ssh -i ~/.ssh/cloud2_bw.key" /opt/galaxy/gie-proxy/ root@10.4.68.201:/opt/galaxy/gie-proxy/)
  • I excluded the Node and Python stuff as there will definitely be differences (Node version itself is different, so...)

  • The below ones are on sn06 but not on sn09 (these looks like the Node build stuff and probably not needed because the galaxy-gie-proxy service starts and runs fine)

    cd+++++++++ proxy/build/
    <f+++++++++ proxy/build/config.gypi
    ...
    ...
    cd+++++++++ venv/share/systemtap/
    cd+++++++++ venv/share/systemtap/tapset/
    
  1. /opt/galaxy/local_tools:
  • An empty directory on sn06. So no action need to be taken. (However, the dir itself been created on sn09)
  1. /opt/galaxy/mutable-config:
  • Observed changes in the size (s), timestamp (t), and permissions (p). Note: I have not included the files that had no updates since 2018 or 2019 in the below list

         shed_data_manager_conf.xml
    
    <f.st...... integrated_tool_panel.xml
    <f.stp..... shed_tool_conf.xml
    <f+++++++++ shed_tool_conf.xml.bak
    <f.st...... shed_tool_data_table_conf.xml
    
  1. /opt/galaxy/mutable-data:
  • Too many files on the sn06. Likely need to rsync the entire directory (except the *.sock files)
  1. /opt/galaxy/server:
  • Needs inspection (my assumption is that we should not rely on local files on sn06 for this and only on our forked repo).
  1. /opt/galaxy/shed_tools:
  • A complete rsync.
  1. /opt/galaxy/shed_tools-local -> shed_tools:
  • We should not have this symlink, I think, so this can be ignored (double check the sn06/sn09/gxconfig.yml files and ensure that nothing is referencing the shed_tools-local and only shed_tools)
  1. /opt/galaxy/tool-data:
  • A complete rsync.
  1. /opt/galaxy/venv:
  • Can be ignored as this should also be automatically created by the playbook, unless someone did some local changes and I hope nobody did that.

@bgruening
Copy link
Member

galaxy@sn06:~$ cat config/markdown_export.css 
p {
}

img{
        width: 75%;
}

h2 {
    page-break-before: always;
}

/* style tables to look more LaTex-like
and take up less space
*/

pre {
    font-size: xx-small;
}

table {
    table-layout: fixed;
}

pre th {
    display: none;
    /* default formatting would be
    color: white;
    background: #25537b;
    */
}

pre th, td {
    font-size: xx-small;
    padding: 0.3rem;
    white-space: pre-wrap;
    width: 25px;
}

pre tr:nth-child(2) td {
    height: 50px;
    vertical-align: bottom;
    transform-origin: bottom left;
    transform: translateX(20px) rotate(-45deg);
}
``

@bgruening
Copy link
Member

bgruening commented Jul 24, 2025

This would be nice to take into the playbook, assuming we will ever use it again:

  • /opt/galaxy/server/scripts/grt/grt.yml
  • /opt/galaxy/server/dependencies needs to be rsynced - unfortunately.

@bgruening
Copy link
Member

bgruening commented Jul 24, 2025

This symlink might be needed if tool help images are not shown:

in /opt/galaxy ...

lrwxrwxrwx   1 root   root      10 Feb 12 17:17 shed_tools-local -> shed_tools

@bgruening
Copy link
Member

Needs to go to the ansible file: /opt/galaxy/config/email_domain_blocklist.conf

@sanjaysrikakulam
Copy link
Member Author

  • /opt/galaxy/server/dependencies needs to be rsynced - unfortunately.

This seems to be a symlink on sn06 (also automatically done on sn09)

on sn06

root@sn06:/opt/galaxy/config$ ll /opt/galaxy/server/dependencies
lrwxrwxrwx. 1 galaxy galaxy 17 Jan 14  2021 /opt/galaxy/server/dependencies -> /usr/local/tools/
root@sn06:/opt/galaxy/config$

on sn09

root@sn09:/opt/galaxy$ ll /opt/galaxy/server/dependencies
lrwxrwxrwx. 1 galaxy galaxy 17 Jul 23 12:45 /opt/galaxy/server/dependencies -> /usr/local/tools/
root@sn09:/opt/galaxy$

and /usr/local/tools is an NFS mount

/usr/local/tools                                -hard,rw,nosuid,nconnect=2,vers=3                               denbi.svm.bwsfs.uni-freiburg.de:/dnb01/tools

@sanjaysrikakulam
Copy link
Member Author

Needs to go to the ansible file: /opt/galaxy/config/email_domain_blocklist.conf

Seems already on the playbook repo because I see this file on the sn09. So all good on this one.

@sanjaysrikakulam
Copy link
Member Author

  • /opt/galaxy/server/scripts/grt/grt.yml

Just found that this is already in the playbook repo but instead of copying it to /opt/galaxy/server/scripts/grt/grt.yml the playbook seems to template it to /opt/galaxy/config/grt.yml (if this project is restarted we can update the path in the sn09 group_vars). This has been the state for this file also in sn06.

@sanjaysrikakulam sanjaysrikakulam marked this pull request as ready for review July 24, 2025 16:09
@bgruening
Copy link
Member

config is actually a better place for the grt file.

Copy link
Contributor

@mira-miracoli mira-miracoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just some minor things (@bgruening already mentioned the toolmsg file)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants