Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV support #210

Merged
merged 23 commits into from
Jun 4, 2024
Merged

CSV support #210

merged 23 commits into from
Jun 4, 2024

Conversation

yashpatel6
Copy link
Collaborator

@yashpatel6 yashpatel6 commented Jun 1, 2024

  • I have read the code review guidelines and the code review best practice on GitHub check-list.

  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)-[brief_description_of_branch].

  • I have set up or verified the branch protection rule following the github standards before opening this pull request.

  • I have added my name to the contributors listings in the
    metadata.yaml and the manifest block in the nextflow.config as part of this pull request, am listed
    already, or do not wish to be listed. (This acknowledgement is optional.)

  • I have added the changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.

  • I have updated the version number in the metadata.yaml and manifest block of the nextflow.config file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)

  • I have tested the pipeline on at least one A-mini sample.

Adding CSV support back in. Inputs can now also be provided with a CSV with empty cells for certain values as needed. For BAM and FASTQ inputs, the CSV remains the same as it previously was. For SRC input, a set of columns are defined. For mix inputs, empty cells can be used, for example to provide BAM and SNV call inputs, rows giving SNV inputs can leave the BAM columns empty.

Regardless of input type, the common columns of: patient, sample, state should always be provided and filled in.

The YAML templates were also updated to be clearer about which keys and values need to be filled in.

Testing Results

CSV input test

Tested with CSV input here: /hot/software/pipeline/metapipeline-DNA/Nextflow/development/unreleased/yashpatel-csv-support/sample_test

  • CSV: /hot/software/pipeline/metapipeline-DNA/Nextflow/development/unreleased/yashpatel-csv-support/sample_test/mix_input_snv.csv
  • Log: /hot/software/pipeline/metapipeline-DNA/Nextflow/development/unreleased/yashpatel-csv-support/sample_test/meta-csv.log
  • Config: /hot/software/pipeline/metapipeline-DNA/Nextflow/development/unreleased/yashpatel-csv-support/sample_test/template.config

Parsing match test

The parsing function itself was tested for the different input types with corresponding logs to show matching format to YAML: /hot/software/pipeline/metapipeline-DNA/Nextflow/development/unreleased/yashpatel-csv-support/parsing_test/log-{BAM,CRAM,FASTQ,SRC-CNA,SRC-SNV,SRC}.log

@tyamaguchi-ucla tyamaguchi-ucla self-requested a review June 1, 2024 05:07
Copy link
Member

@zhuchcn zhuchcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wonderful! I think a lot of people will appreciate this effort to keep supporting CSV input. So we now suport CSV input for FASTQ, BAM, and BAM + SRC inputs, but not FASTQ + SRC inputs (which is totally fine with me)?

@yashpatel6
Copy link
Collaborator Author

This wonderful! I think a lot of people will appreciate this effort to keep supporting CSV input. So we now suport CSV input for FASTQ, BAM, and BAM + SRC inputs, but not FASTQ + SRC inputs (which is totally fine with me)?

Technically, FASTQ + SRC input should also work though I haven't tested it explicitly here

Copy link
Contributor

@sorelfitzgibbon sorelfitzgibbon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the added explanation for YAML inputs.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
yashpatel6 and others added 2 commits June 4, 2024 10:57
Co-authored-by: Sorel Fitz-Gibbon <3223552+sorelfitzgibbon@users.noreply.github.com>
Co-authored-by: Sorel Fitz-Gibbon <3223552+sorelfitzgibbon@users.noreply.github.com>
@yashpatel6 yashpatel6 merged commit 20839d8 into main Jun 4, 2024
5 checks passed
@yashpatel6 yashpatel6 deleted the yashpatel-csv-support branch June 4, 2024 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants