You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a need to be able to indicate file types for different output files to store in IRIDA Next (e.g., reads, assemblies, mlst allele profiles).
1.1. Adjust metadata associated with files
One set of solutions focuses on adjusting metadata/attributes (such as type and format) in IRIDA Next. There may be a need to extend this metadata to include additional keys (such as scheme for a cg/wgMLST scheme). This needs to be conveyed/defined by a pipeline (or at least, when IRIDA Next stores analysis results files, it needs to set the correct keys to the correct values).
1.2. Adjust file suffixes
Another set of solutions focuses on adjusting output file suffixes (e.g., *.fastq.gz or *.fasta.gz).
2. Solution 0: IRIDA Next defines attributes of files
Here, IRIDA Next defines a set of attributes for attached files based in different criteria (such as file name, etc). This is currently implemented within IRIDA Next. An example of defined attributes can be found at:
3. Solution 1: Define additional attributes in IRIDA Next
This solution involves adding additional attributes in IRIDA Next for key file formats and types we need to work with. This makes it easier to select these file types within pipelines or from the IRIDA Next API. These attributes will be determined by IRIDA Next.
3.1. Additional formats and types
format="json", type="mlst": For (cg/wg)MLST results stored as a JSON file.
format="fasta", type="assembly": For assembled genomes in fasta format
format="genbank", type="assembly": For assembled/annotated genomes in genbank format
3.2. Additional attributes
In addition to additional format/type attributes, it might make sense to define additional attributes which can be set. In particular:
mlst_scheme: This attribute defines the particular MLST scheme the associated file represents. This could be read from information within the MLST alleles JSON file.
3.3. Advantages
Minimal modifications needed for IRIDA Next
3.4. Disadvantages
Less flexible for extending file attributes in the future
Assignment of attributes determined entirely by pipeline developers
4. Solution 2: Define additional attributes in pipeline
Similar to Solution 1, IRIDA Next will include additional attributes associated with a file (such as mlst_scheme). However, the values of these attributes can be set by a pipeline in the iridanext.output.json output file.
Currently, the iridanext.output.json defines files associated with samples (or with the analysis pipeline as a whole) as a list of JSON objects which includes the key path. This solution would extend this JSON structure to add additional keys associated with files.
4.1. Example
For example, the following could be an iridanext.output.json output file:
That is, each file entry has an associated type, or mlst_scheme (or other defined keywords).
4.2. Providing keys to iridanext.output.json
If the nf-iridanext plugin was used to write the iridanext.output.json file, then the following Nextflow configuration could possibly be used to create the additional keys.
Here, I assume that in the Nextflow pipeline --scheme is used to define the MLST scheme, which is passed as metadata to generate the final iridanext.output.json file.
4.3. Advantages
Pipeline developers can define the file attributes rather than code located in IRIDA Next.
This allows each pipeline to customize the type of attributes and values to use.
4.4. Disadvantages
More complicated code changes
4.5. Questions/Caveats
How to handle situations where both IRIDA Next and a pipeline attempt to write to the same attribute?
5. Solution 3: Name output files with specific suffixes
In this solution, output files to be saved by IRIDA Next have specific suffixes which are used to define file type/constrain selection in a pipeline.
1. Problem Statement
There is a need to be able to indicate file types for different output files to store in IRIDA Next (e.g., reads, assemblies, mlst allele profiles).
1.1. Adjust metadata associated with files
One set of solutions focuses on adjusting metadata/attributes (such as
type
andformat
) in IRIDA Next. There may be a need to extend this metadata to include additional keys (such asscheme
for a cg/wgMLST scheme). This needs to be conveyed/defined by a pipeline (or at least, when IRIDA Next stores analysis results files, it needs to set the correct keys to the correct values).1.2. Adjust file suffixes
Another set of solutions focuses on adjusting output file suffixes (e.g.,
*.fastq.gz
or*.fasta.gz
).2. Solution 0: IRIDA Next defines attributes of files
Here, IRIDA Next defines a set of attributes for attached files based in different criteria (such as file name, etc). This is currently implemented within IRIDA Next. An example of defined attributes can be found at:
https://github.com/phac-nml/irida-next/blob/72e0a6f32f932fa608ebccee0496b7c2d041e2db/test/fixtures/attachments.yml#L34-L43
3. Solution 1: Define additional attributes in IRIDA Next
This solution involves adding additional attributes in IRIDA Next for key file formats and types we need to work with. This makes it easier to select these file types within pipelines or from the IRIDA Next API. These attributes will be determined by IRIDA Next.
3.1. Additional formats and types
format="json"
,type="mlst"
: For (cg/wg)MLST results stored as a JSON file.format="fasta"
,type="assembly"
: For assembled genomes in fasta formatformat="genbank"
,type="assembly"
: For assembled/annotated genomes in genbank format3.2. Additional attributes
In addition to additional format/type attributes, it might make sense to define additional attributes which can be set. In particular:
mlst_scheme
: This attribute defines the particular MLST scheme the associated file represents. This could be read from information within the MLST alleles JSON file.3.3. Advantages
3.4. Disadvantages
4. Solution 2: Define additional attributes in pipeline
Similar to Solution 1, IRIDA Next will include additional attributes associated with a file (such as
mlst_scheme
). However, the values of these attributes can be set by a pipeline in theiridanext.output.json
output file.Currently, the
iridanext.output.json
defines files associated with samples (or with the analysis pipeline as a whole) as a list of JSON objects which includes the keypath
. This solution would extend this JSON structure to add additional keys associated with files.4.1. Example
For example, the following could be an
iridanext.output.json
output file:iridanext.output.json
That is, each file entry has an associated
type
, ormlst_scheme
(or other defined keywords).4.2. Providing keys to
iridanext.output.json
If the nf-iridanext plugin was used to write the
iridanext.output.json
file, then the following Nextflow configuration could possibly be used to create the additional keys.nextflow.config
Here, I assume that in the Nextflow pipeline
--scheme
is used to define the MLST scheme, which is passed as metadata to generate the finaliridanext.output.json
file.4.3. Advantages
4.4. Disadvantages
4.5. Questions/Caveats
5. Solution 3: Name output files with specific suffixes
In this solution, output files to be saved by IRIDA Next have specific suffixes which are used to define file type/constrain selection in a pipeline.
Specifically:
*.fastq.gz
(or*.fq.gz
): Defines reads (fastq format).*.fasta.gz
: An assembled genome (could also be*.assembly.fasta.gz
).*.mlst.json.gz
: MLST allele profiles in JSON format.The
iridanext.output.json.gz
would list the files with the appropriate names. That is:iridanext.output.json
The text was updated successfully, but these errors were encountered: