Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One-Click Installation Universal Script for MIMIC-III Clinical Database CareVue subset with PostgreSQL on MAC OS, Windows, and Linux #1471

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

ningyile
Copy link

@ningyile ningyile commented Jan 20, 2023

Dear Administrator of MIMIC Team:

I have completed the re-edition of SQL codes to install MIMIC-III Clinical Database CareVue subset(I delete the SQL query for Metavision subset and convert BigQuery function to Postgresql function), and I have written a shell script that can install the database and Concepts with Postgresql on Mac, Linux, and Windows.

  • MIMIC new beginners only need to type bash install.sh in the terminal (For windows, the terminal was provided by bash.exe of Git software in C:\Program Files\Git\bin), and then the database can be automatically configured and installed. The whole process does not require any further operation.

  • Sepsis3 has been added to the Concepts, which was edited based on A Comparative Analysis of Sepsis Identification Methods in an Electronic Database. sepsis3-mimic created byAlistair Johnson

  • Charlson Comorbidity Index has been added to the Concepts, which was edited based onCharlson in Concepts of MIMICIV

  • The readme profile contains detailed illustrations, which should be clear enough for beginners. Although the process is simple, I plan to upload a tutorial for MAC Linux, and Windows on Youtube. But the video tutorial is still being edited, and the video link will be added to the readme profile later. I hope the PR I submitted can be accepted, if you have any further questions, please get in touch with me. I'm looking forward to hearing back from you.

@alistairewj @tompollard

Chris Ning
ningyile@qq.com

@ningyile ningyile changed the title One-Click Installation Universal Script for MIMIC-III Clinical Database CareVue subset on MAC OS, Windows, and Linux One-Click Installation Universal Script for MIMIC-III Clinical Database CareVue subset with PostgreSQL on MAC OS, Windows, and Linux Jan 20, 2023
Copy link
Member

@alistairewj alistairewj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clearly took a lot of effort and I am very appreciative of the PR!

I like the idea of a very easy installation (we've always tried to make it simpler). We did start off with a Makefile for creating MIMIC, but in the end I found it was very difficult to keep things friendly across operating systems, and opted for a fully SQL approach.

With that in mind, after a quick look, there are some important changes.

  • I'd rather not split mimic-iii into mimic-iii and mimic-iii-carevue. Since mimic-iii-carevue is a subset of mimic-iii, it's easier to keep the one folder and use it for mimic-iii-carevue. You should be able to use most of the mimic-iii scripts to build the data. The only catch is the validation script, which we might need a new version of.
  • You'll have to use the existing concepts, and not create new SQL files. That should only require judicious use of relative paths.
  • I don't like having a registry file to register a new menu item in Windows. Instead, I'd ask the user to install the bash software you need, and tell them they need to launch a terminal in the folder. That avoids the awkwardness of having a registry file and it means all the operating systems have a mostly equivalent starting point.
  • For install.sh, I've made comments inline above
  • Internationalization of bash scripts is possible with some additional work, which seems like a better approach than the back and forth if statements.

Before you go ahead and put more work into this though, I'd like to discuss this with some colleagues (@tompollard / @briangow) - I'm not 100% sure we'd want to commit to maintaining something like this. It is hard enough to keep this repo up to date as is!

user="postgres"
test_db_login=false
ptn=" reject$| md5$| password$| scram-sha-256$| gss$| sspi$| ident$| peer$| pam$| ldap$| radius$| cert$"
sed_ptn="s/$ptn/ trust/g"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to change the PostgreSQL security for inexperienced users to trust. Instead, try to get this to work for OS authentication (either via md5 or scram-sha-256). I think the best approach is:

  1. Detect the type of auth. If it's incompatible (e.g. reject), ask the user if you can modify it to OS-level authentication (scram-sha-256). If they refuse, exit script.
  2. If it's password, offer to export the PGPASSWORD env var at the start of the run.

sed_version="$(which sed)"

# reset the mode of local connection as "trust"
if [[ "$os" == "Darwin" ]] && [[ "$sed_version" == "/usr/bin/sed" ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use "$os" == "Darwin" because users may install a different version of sed on their platform, instead you can check if --version has a valid return with gnu_sed=`sed --version 2>/dev/null` -> if $gnu_sed is non-empty, then it's GNU sed

# the database user with the same name as the system user exists
echo ""
else
# the database user with the same name as the system user does not exist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments which repeat the below echo statement are redundant, and can be removed

else
# the database user with the same name as the system user does not exist
echo "--- The database user with the same name as the system user does not exist. ---"
echo "--- The database user with the same name as the system user will be created. ---"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to carry on from the last sentence, e.g. "A new user will be created"

echo ""

# create database user with the same name as the system user
psql -U postgres -c "create user $(whoami) with superuser createdb createrole replication bypassrls password '$(whoami)';"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prompt the user for a password, rather than default it to the username.

build_starttime=$(date +%s)

# drop database if exists
psql -c "drop database if exists mimic3_carevue;"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The database name, schema name, and derived schema name should be environment variables set at the top of the script. This will allow you to add a --carevue flag which changes these variables to carevue specific ones.

I would suggest the database remains simple (mimic), but the schema is updated to mimiciii_cv, and the derived schema is also changed mimiciii_cv_derived.

@ningyile
Copy link
Author

ningyile commented Jan 24, 2023

This clearly took a lot of effort and I am very appreciative of the PR!

I like the idea of a very easy installation (we've always tried to make it simpler). We did start off with a Makefile for creating MIMIC, but in the end I found it was very difficult to keep things friendly across operating systems, and opted for a fully SQL approach.

With that in mind, after a quick look, there are some important changes.

  • I'd rather not split mimic-iii into mimic-iii and mimic-iii-carevue. Since mimic-iii-carevue is a subset of mimic-iii, it's easier to keep the one folder and use it for mimic-iii-carevue. You should be able to use most of the mimic-iii scripts to build the data. The only catch is the validation script, which we might need a new version of.
  • You'll have to use the existing concepts, and not create new SQL files. That should only require judicious use of relative paths.
  • I don't like having a registry file to register a new menu item in Windows. Instead, I'd ask the user to install the bash software you need, and tell them they need to launch a terminal in the folder. That avoids the awkwardness of having a registry file and it means all the operating systems have a mostly equivalent starting point.
  • For install.sh, I've made comments inline above
  • Internationalization of bash scripts is possible with some additional work, which seems like a better approach than the back and forth if statements.

Before you go ahead and put more work into this though, I'd like to discuss this with some colleagues (@tompollard / @briangow) - I'm not 100% sure we'd want to commit to maintaining something like this. It is hard enough to keep this repo up to date as is!

Thank you, Johnson! I think I can solve the problems you pointed out in the next version according to your requirements. But I need to check with you a few questions before I PR the next version of the shell script:

  • The SQL codes of MIMIC-III contains queries for Metavsion tables, and I'm not sure whether it would cause several fatal errors or not. Anyway, I will execute the MIMIC-III SQL codes using relative paths in the shell scripts of next version, instead of creating new SQL codes. However, due to the number of rows for data will be changed, tables may not be checked unless the SQL code of check_data is re-edited or created with a new sub-version.

  • For the issue of Bash Terminals on Windows, I'm going to ask users to decompress the portable version of Git (there is a download link for the portable version, just need to extract and use it without installing Git) and move it to the relative root directory of the MIMIC code, then change the directory where the shell script is located with cd command, and then start the installation script.

  • The purpose of changing the auth of pg_hba.conf is to avoid entering passwords multiple times during the installation (more automatically and silently). So can I do this as the following: ① First, ask the user whether they agree to change the type of auth to trust during the installation (note that only change the auth to trust during the installation, and restore the previous settings after the installation) ②Second, if the user agrees, change the type of auth of pg_hba.conf as trust with sed -i.bk -E "$sed_ptn" "$pg_hba". This command sed -i.bk ...... is compatible with both GNU and FreeBSD version of sed(here we don't have to worry about the version of sed with the compatible command), which can back up pg_hba.conf as pg_hba.conf.bk before the edition of auth. The original backup settings file will be restored and reloaded when the installation is completed, so that pg_hba.conf remains unchanged after the installation.

  • For the issue of internationalization of the bash script, I have tried to write the message in mo files(message object files, *.mo created from portable object files, *.po, need to copy to folder /usr/share/locale/zh_CN/LC_MESSAGES/, which is a problem for Windows) or two separate files (for example: en.ini and cn.ini), but I didn't want to generate additional files in codes of MIMIC-III, so I had to utilize if statements back and forth instead of the language profile mentioned above. Shell internationalization is an issue for cross-platform, but English doesn't seem to be a problem for most MIMIC users, so I'll consider using English only for the next version of the shell script.

@ningyile ningyile requested a review from alistairewj January 24, 2023 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants