Tools for implementing Deep.piste study.
On windows:
You may need to install "Visual C++ Redistributable Packages for Visual Studio 2013" Microsoft C++ Build Tools.
On Ubuntu:
sudo apt-get install python3-tk
pip install deep-piste
- Download source code
git clone https://github.com/Epiconcept-Paris/deidcm.git
git clone https://github.com/Epiconcept-Paris/deep.piste.git
- Create and activate a virtual environment
cd deep.piste/
python3 -m venv env
. env/bin/activate
- Install deidcm
cd ../deidcm
pip install -e .
- Install deep.piste
cd ../deep.piste
pip install -e .
- Checking deidcm installation
Open a python interpreter and try to deidentify a dicom file:
from deidcm.dicom.deid_mammogram import deidentify_image_png
deidentify_image_png(
"/path/to/mammogram.dcm",
"/path/to/processed/output-folder",
"output-filename"
)
- Checking deep.piste installation
When writing the following command, you should be able to see the help menu:
>>> python3 -m dpiste -h
usage: __main__.py [-h] {extract,transform,export,backup} ...
positional arguments:
{extract,transform,export,backup}
extract Invoke initial extractions commands
transform Perform transformation on input data
export Sending data
backup Back up data
options:
-h, --help show this help message and exit
pip install -e .[quality-tools]
Run all tests
pytest
- Run test coverage
coverage run --omit="*/test*,*/deidcm/*" -m pytest
- Visualize report in terminal
coverage report -i
Format your files with python3 -m autopep8 --in-place file/to/format
Lint your files with python3 -m pylint file/to/lint
Epiconcept server is used as a hub for screening data collection before all data can be sent via Secure File Transfer Protocol (SFTP) to the HDH server.
- Screening data from the CRCDC (screening regional coordination center) need to be collected from Neoscope on CRCDC workstation and sent encrypted via Epifiles to the Epiconcept server.
- This step requires the Epiconcept data manager to intervene on the CRCDC operator workstation.
- Prerequisites on CRCDC operator workstation :
- Python 3.8.12 with tkinter
- Visual C++ Redistributable Packages for Visual Studio 2013
- Deep.piste package (python -m pip install deep.piste) installed on python3 virtual environment
- Screening data extracted by CRCDC operator with the following requests : replace date1 and date2 below by actual dates
- The Epiconcept data manager generates encryption QR code on local machine with deep.piste package :
python -m dpiste extract neoscope generate-qr-key
- From CRCDC operator with prerequisites, run:
python -m dpiste extract neoscope encrypt -w
- Upload encrypted data from CRCDC operator workstation to Epifiles
- From Epiconcept server, pull from Epifiles with QR code copied to clipboard :
python -m dpiste extract neoscope pull2bigdata -u [epfiles-user]
zipped folder extraction_neoscope.zip loaded on Epiconcept server containing screening data for perimeter defined in requests above.
- The Epiconcept data manager opens an extraction request on Esis-3D portal, containing the following SQL request:
- Resulting file from esis extraction : esis_dicom_guid.parquet
- All files must be on an Epiconcept server :
├── input
│ ├── crcdc
│ │ └── refusing-list.csv # list of patients excluded from study perimeter
│ ├── easyocr
│ │ ├── craft_mlt_25k.pth # weights of easy ocr model for images anynomization
│ │ └── latin_g2.pth # weights of easy ocr model for images anynomization
│ ├── epiconcept
│ │ ├── mapping-table.csv # mapping table between patient ids and pseudo ids
│ │ └── ocr_deid_ignore.txt # text elements to be ignored at anonymization step
│ ├── esis
│ │ └── esis_dicom_guid.parquet # esis extraction described above
│ ├── hdh
│ │ └── p11_encryption_public.rsa # encryption public ssh key provided by HDH operator (open HDH Zendesk ticket)
│ └── neo
│ └── extraction_neoscope.zip # neoscope CRCDC extraction described above
└── output
├── cnam
│ ├── duplicates_to_keep.csv # list of duplicate entries to keep
│ └── safe.zip
└── hdh
├── p11_transfer_private_key.rsa # signature ssh key generated by Epiconcept data manager
└── p11_transfer_public_key.rsa # signature ssh public key, to send to HDH operator in a secure manner
- On local Epiconcept data manager workstation :
- deep.piste package installed in a Python 3.10 venv (might work with another Python, but tested in 3.10 only)
- inside deep.piste package, dpiste/ansible/export_hdh/hosts filled with ssh hosts info (the same as used to connect to Epiconcept servers)
- Open an update ansible/group_var/nodes.yml file with latest config info :
- ssh_user (name of epiconcept operator on Epiconcept servers)
- ssh_source_key (path to private ssh key to connect to Epiconcept servers)
- ssh_source_key_pub : idem but public key
- python_path : path to python on each one of the Epiconcept servers, python is to be installed by operator if not done already (Python 3.8 used in 2024)
- dp_home : path to the folder (not the deep.piste package) containing _data_ folder with all data required for transfer (see above)
- dp_code = dp_home
-
Epiconcept operator must install Pass on his local workstation and have the 3 following keys:
- infra/bgdt : password to sudo on Epiconcept servers, provided by Epiconcept infra
- infra/sshpassphrase : ssh key passphrase to connect to Epiconcept servers
- epitools/hdh_epi_transfer_passphrase : passphrase for ssh signature key (stored at path SERVER:/space/Work/operator/github/deep.piste/data/output/hdh)
-
Finally, check all lines of roles/node/tasks/main.yml :
- update user name when it is hardcoded
- update Python version and paths if you have a different config
-
Epiconcept servers preparation :
- After getting access rights to SFTP, transfer unchanged files from sftp to Epiconcept server with DP_HOME which will be used for transfer : from sftp run command:
SERVER get -R input_data /home/operator/DP_HOME/data/input
- Check hash of mapping_table.csv : the expected hash is in blue
From activated venv on local workstation, run command from dpiste/ansible/export_hdh cwd :
ansible-playbook export-data-hdh.yml -i hosts
The playbook runs the successive steps to launch transfer. To run transfer from servers to SFTP, the main Python command can be found in /ansible/export_hdh/roles/running_export/tasks/main.yml :
python -m dpiste export hdh sftp -s {{ sftp_server }} -o {{organization id}} -u {{ sftp_user }} -l 100 {{limit size of data to be transferred before transfer stops, in GB. Default = 100}} -t {{ tmp_dir }} -i {{ hosts, servers list defined above }}
NB : the transfer will stop at the minimum between 95% of available space on SFTP and -l argument, and wait until HDH job starts deleting files from the SFTP.
NB : command to stop export
ansible-playbook export-data-hdh.yml -t stop -i hosts