Skip to content

Epiconcept-Paris/deep.piste

Repository files navigation

deep.piste

Tools for implementing Deep.piste study.

Installation

Prerequisites

On windows:

You may need to install "Visual C++ Redistributable Packages for Visual Studio 2013" Microsoft C++ Build Tools.

On Ubuntu:

sudo apt-get install python3-tk

Installation

pip install deep-piste

Installation for contributors

  1. Download source code
git clone https://github.com/Epiconcept-Paris/deidcm.git
git clone https://github.com/Epiconcept-Paris/deep.piste.git
  1. Create and activate a virtual environment
cd deep.piste/
python3 -m venv env
. env/bin/activate
  1. Install deidcm
cd ../deidcm
pip install -e .
  1. Install deep.piste
cd ../deep.piste
pip install -e .

Checking installation

  1. Checking deidcm installation

Open a python interpreter and try to deidentify a dicom file:

from deidcm.dicom.deid_mammogram import deidentify_image_png

deidentify_image_png(
    "/path/to/mammogram.dcm",
    "/path/to/processed/output-folder",
    "output-filename"
)
  1. Checking deep.piste installation

When writing the following command, you should be able to see the help menu:

>>> python3 -m dpiste -h

usage: __main__.py [-h] {extract,transform,export,backup} ...

positional arguments:
  {extract,transform,export,backup}
	extract         	Invoke initial extractions commands
	transform       	Perform transformation on input data
	export          	Sending data
	backup          	Back up data

options:
  -h, --help        	show this help message and exit

Tools for developers

Installation

pip install -e .[quality-tools]

Test and Test Coverage

Tests

Run all tests

pytest

Calculate and Visualize Test Coverage

  1. Run test coverage
coverage run --omit="*/test*,*/deidcm/*" -m pytest
  1. Visualize report in terminal
coverage report -i

Formatter and Linter

Format your files with python3 -m autopep8 --in-place file/to/format

Lint your files with python3 -m pylint file/to/lint

Procedure for transferring screening data to the Health Data Hub (HDH) servers

Epiconcept server is used as a hub for screening data collection before all data can be sent via Secure File Transfer Protocol (SFTP) to the HDH server.

Procedure for extracting screening data from CRCDC database to Epiconcept server

Design

  • Screening data from the CRCDC (screening regional coordination center) need to be collected from Neoscope on CRCDC workstation and sent encrypted via Epifiles to the Epiconcept server.
  • This step requires the Epiconcept data manager to intervene on the CRCDC operator workstation.

alt text

Prerequisites

  • Prerequisites on CRCDC operator workstation :
    • Python 3.8.12 with tkinter
    • Visual C++ Redistributable Packages for Visual Studio 2013
    • Deep.piste package (python -m pip install deep.piste) installed on python3 virtual environment
    • Screening data extracted by CRCDC operator with the following requests : replace date1 and date2 below by actual dates

Women list request

Women at risk request

Screening track record

Diagnosis

Adicap codes

Radiology centers

Running the export

  • The Epiconcept data manager generates encryption QR code on local machine with deep.piste package :
python -m dpiste extract neoscope generate-qr-key
  • From CRCDC operator with prerequisites, run:
python -m dpiste extract neoscope encrypt -w
  • Upload encrypted data from CRCDC operator workstation to Epifiles
  • From Epiconcept server, pull from Epifiles with QR code copied to clipboard :
python -m dpiste extract neoscope pull2bigdata -u [epfiles-user]

Results

zipped folder extraction_neoscope.zip loaded on Epiconcept server containing screening data for perimeter defined in requests above.

Procedure for extracting esis dicom metadata

  • The Epiconcept data manager opens an extraction request on Esis-3D portal, containing the following SQL request:

Esis extraction request

  • Resulting file from esis extraction : esis_dicom_guid.parquet

Procedure for running the export

Prerequisites

  • All files must be on an Epiconcept server :
├── input
│   ├── crcdc
│   │   └── refusing-list.csv # list of patients excluded from study perimeter
│   ├── easyocr
│   │   ├── craft_mlt_25k.pth # weights of easy ocr model for images anynomization
│   │   └── latin_g2.pth # weights of easy ocr model for images anynomization
│   ├── epiconcept
│   │   ├── mapping-table.csv # mapping table between patient ids and pseudo ids
│   │   └── ocr_deid_ignore.txt # text elements to be ignored at anonymization step
│   ├── esis
│   │   └── esis_dicom_guid.parquet # esis extraction described above
│   ├── hdh
│   │   └── p11_encryption_public.rsa # encryption public ssh key provided by HDH operator (open HDH Zendesk ticket)
│   └── neo
│   	└── extraction_neoscope.zip # neoscope CRCDC extraction described above
└── output
	├── cnam
	│   ├── duplicates_to_keep.csv # list of duplicate entries to keep
	│   └── safe.zip
	└── hdh
    	├── p11_transfer_private_key.rsa # signature ssh key generated by Epiconcept data manager
    	└── p11_transfer_public_key.rsa # signature ssh public key, to send to HDH operator in a secure manner
  • On local Epiconcept data manager workstation :
    • deep.piste package installed in a Python 3.10 venv (might work with another Python, but tested in 3.10 only)
    • inside deep.piste package, dpiste/ansible/export_hdh/hosts filled with ssh hosts info (the same as used to connect to Epiconcept servers)

alt text

  • Open an update ansible/group_var/nodes.yml file with latest config info :

alt text

- ssh_user (name of epiconcept operator on Epiconcept servers)
- ssh_source_key (path to private ssh key to connect to Epiconcept servers)
- ssh_source_key_pub : idem but public key
- python_path : path to python on each one of the Epiconcept servers, python is to be installed by operator if not done already (Python 3.8 used in 2024)
- dp_home : path to the folder (not the deep.piste package) containing _data_ folder with all data required for transfer (see above)
- dp_code = dp_home
  • Epiconcept operator must install Pass on his local workstation and have the 3 following keys:

    • infra/bgdt : password to sudo on Epiconcept servers, provided by Epiconcept infra
    • infra/sshpassphrase : ssh key passphrase to connect to Epiconcept servers
    • epitools/hdh_epi_transfer_passphrase : passphrase for ssh signature key (stored at path SERVER:/space/Work/operator/github/deep.piste/data/output/hdh)
  • Finally, check all lines of roles/node/tasks/main.yml :

    • update user name when it is hardcoded
    • update Python version and paths if you have a different config
  • Epiconcept servers preparation :

    • After getting access rights to SFTP, transfer unchanged files from sftp to Epiconcept server with DP_HOME which will be used for transfer : from sftp run command:
    SERVER get -R input_data /home/operator/DP_HOME/data/input
    
    • Check hash of mapping_table.csv : the expected hash is in blue

    alt text

Running export

From activated venv on local workstation, run command from dpiste/ansible/export_hdh cwd :

ansible-playbook export-data-hdh.yml -i hosts

The playbook runs the successive steps to launch transfer. To run transfer from servers to SFTP, the main Python command can be found in /ansible/export_hdh/roles/running_export/tasks/main.yml :

python -m dpiste export hdh sftp -s {{ sftp_server }} -o {{organization id}} -u {{ sftp_user }} -l 100 {{limit size of data to be transferred before transfer stops, in GB. Default = 100}} -t {{ tmp_dir }} -i {{ hosts, servers list defined above }}

NB : the transfer will stop at the minimum between 95% of available space on SFTP and -l argument, and wait until HDH job starts deleting files from the SFTP.

Expected output :

alt text

NB : command to stop export

ansible-playbook export-data-hdh.yml -t stop -i hosts

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages