The flow of clinical trials data can quickly become complicated. The dashboard presented can help programmers and project managers understand the data flow and quickly catch discrepancies
The dashboard can help with:
- visually exploring dataset connections
- ensuring that all main/QC programmers get notified if the underlying dataset changes
- making sure all raw datasets are necessary and used on the study
- checking if the actual data flow is in accordance with specification and industry/company standards
- comparing inputs between main and QC programs
- making sure we have all the datasets necessary to create each domain
- Program create_graph_input.R checks SAS programs using regular expressions (REGEX). Program looks for mentions of "RAW", "SDTM" and "ADAM" to create a list of all datasets used while creating the program.
- It outputs two dataframes:
- one with nodes (unique list of all the datasets names found)
- one with edges (connections between the datasets e.g. RAW.DM --> SDTM.DM ...)
- For the purpose of this repository I am masking the data using a Python script (masking.py)
- R Shiny script reads in te two datasets and creates a graph using visNetwork library
- The user can:
- filter the dataset if he/she wants to see only a specific dataset
- download the filtered (or not) dataset with edges
While the app is starting the graph shows all the datasets in the input data
Selecting a single node makes graph filter underlying dataset and show connections only for that dataset. It also outputs a reactive table where we can see the relations in a tabular form. User can download the dataset any time.