Description
When working with the SRA, a list of accession numbers may be exported. To insert that list directly into a config.yml
file for use in genome-grist, we can use the sed
command, or base python to edit the list, read the config.yml
file, and insert the list into the samples:
section of the config.yml
.
Using sed
My config.yml
file:
samples:
outdir: outputs.trial/
sourmash_databases:
- gtdb-rs207.genomic-reps.dna.k31.zip
The accession list directly exported from the SRA Run Browser as a txt
file
ERR5004365
ERR5003005
ERR5003006
ERR5003008
ERR5003010
ERR5003011
ERR5003578
ERR5001725
ERR5001726
ERR5001728
To format and insert the accession list into the samples:
section of the yml
sed "s/^/ - /" short_acc_list.txt | sed "/samples:/r /dev/stdin" -i config.yml
The first sed
command inserts a space
, -
, and another space
at the beginning of each line in the accession list txt
file. This formats the list for the config
file.
The second sed
command reads the output of the first command and inserts it in the after the line matching samples:
in the config.yml
file.
Outputting a config.yml
file in genome-grists
desired format.
samples:
- ERR5004365
- ERR5003005
- ERR5003006
- ERR5003008
- ERR5003010
- ERR5003011
- ERR5003578
- ERR5001725
- ERR5001726
- ERR5001728
outdir: outputs.trial/
sourmash_databases:
- gtdb-rs207.genomic-reps.dna.k31.zip
Using base python
With the exact same structure as above, using a python script instead of sed
linux command line function we can achieve the same output.
# Read the accession list text file and format the list to work in the config file
with open('short_acc_list.txt', 'r') as fp:
lines = fp.readlines()
modified_lines = [' - ' + line.strip() for line in lines]
# Read the config file and insert each line of the formatted list in a new line after `samples:`
with open('config.yml', 'r') as fp:
content = fp.read()
modified_content = content.replace('samples:', 'samples:\n' + '\n'.join(modified_lines))
# Overwrite the existing config file with the modified config file that contains the formatted list
with open('config.yml', 'w') as file:
file.write(modified_content)