-
Notifications
You must be signed in to change notification settings - Fork 2
/
Changelog.txt
456 lines (325 loc) · 18.9 KB
/
Changelog.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
### 0.5.0 ###
- Added -x/--no_suffix option to solid2fastq tool
- Added --force_rev option to trimfastq tool
- Added --gc_window option to ref_select tool
- Added ability to detect assembly issues from physical coverage
anomalies in ref_select
- Added CIRCLE command and associated options to graph_ops tool
- Added capability to LOAD and merge from multiple JSON files in
graph_ops
- Added capability to LOAD an optional separate edge file in
graph_ops
- Added capability to flag problem nodes in DOT file outoputs in
graph_ops
- Added capability to add "external" contigs using the INSERT command
in graph_ops
- Added capability to finely control removal of contigs with the soft
option to the SELCC command in graph_ops
- Added capability to link a subset of scaffolds by name using the
"scaff_names" and "exclusive" options of the SCAFLNK command in graph_ops
- Added capability for SCRIPT files to recursively call themselves
in graph_ops
- Changes to the graph_ops JSON schema to support the above features
- Updated nodewrap.sh script to use --max_semi_space option instead
of obsolete --max_new_space with newer versions of node.js
- Updated coffeescript files to correctly identify newer versions of
node.js
- Added a requirement for cmake version 2.8.5 or later to handle
an issue with out-of-order execution of tests that occurs in cmake
2.8.4
- Updated README with modern build instructions on MacOS
- Assorted minor bug fixes and performance improvements
- Documentation improvements.
### 0.4.18 ###
- Fixed a formatting bug in the --z_stats option of the tetracalc
tool. Also added documentation for the file format of the output
of this option.
- Added checks to be more proactive about deleting scaffolds and
clusters of scaffolds when an operation invalidates the connected
component structure of the connection graph. Also tightened-up
checking for the presence of scaffolds and/or clusters for
operations that require them.
- Fixed a crash in the graph_ops GC and GCC commands when there
are zero selected nodes.
- Changed recommended compiler to GCC 4.7 and added cmake error
when attempting to build under Xcode 5.
- Documentation improvements.
### 0.4.17 ###
- Fixed a bug in ref_select potentially affecting the accuracy of
contig %GC calculations when nucleotide reference sequences were
imported from a FASTA file using the --ref parameter. The bug only
affects reference sequences containing non-'N' ambiguous nucleotides
in one or more positions. Imported sequences consisting entirely of
[ACGNTacgnt] are not affected, and neither are reconstructed
sequences (using -q). However, for analyses using --ref and containing
affected sequences, it is highly recommended to re-run them through
ref_select and repeat all downstream analyses that depend on the
output %GC values, including most of the calculations done by
various graph_ops sub-commands that use mate-pair edge scores.
- Fixed a bug in the graph_ops GC and GCC commands which caused the
last completed command to be ommitted from the "Processing steps
completed" output. The command history was properly recorded, but
the most recent command wasn't being printed by GC/GCC.
- Fixed a crash in the graph_ops SELND command when no valid nodes
are specified in the input parameters.
### 0.4.16 ###
- Fixed a crash in ref_select that could occur with a seldom used
combination of input parameters. Specifically, if the --split option
was invoked without providing a catalog file ref_select would segfault
during preprocessing of the reference_sequence data.
- Fixed an issue in ref_select that could cause inaccurate (too low)
sequence coverage values to be reported and used in some downstream
calculations under specific circumstances. This issue only affected
contiguous genome-sized (megabase+) reference sequences with very high
read coverage (1000-fold+). The fix involves a precision change of
a variable used in coverage calculations, leading to slightly
different results under all circumstances (more significant digits
of precision), but it should not significantly change results except in
the problem cases noted above that lead to underflow in coverage
calculations.
- Fixed a crash in graph_ops when performing certain operations on an
input graph that contains no edges of a given type (such as output
from ref_select when run on fragment only alignments that contain no
mate-pair information).
- Documentation improvements.
### 0.4.15 ###
- The graph_ops SELCC command parameters min_nodes, min_seqlen, and
sequence/sequences now work as proper selection filters, acting on
either all connected components, or a subset defined by one of the
selection parameters (e.g. ccnum, names). These filter parameters
may also now be combined, resulting in a conjunctive (AND)
relationship (ie. to be selected, a connected component must be
selected by all of the filters.)
- Fixed problems in the graph_ops SELCC and SELCLUST commands that
could require SCAFF to be re-run after selecting a subset of scaffolds.
Existing scaffolds are now properly remapped when a subset of them are
selected via SELCC and SELCLUST. Note that scaffold clusters are still
lost when using SELCC, which is intentional.
- Fixed problems in the graph_ops MST, SST, SCAFF and SCRIPT commands
that could lead to crashes in several unusual corner cases.
- Fixed a bug in ref_select that could lead to problems reading SAM files
generated by aligners that do not remove trailing '/1' and '/2' paired
read name suffixes.
- With the release of Xcode 5, Apple has removed all support for GNU
compilers and OpenMP multicore threading support. Code was added to all
C programs and the build process to detect attempts to build/run without
OpenMP and generate approriate error messages.
- Documentation improvements
### 0.4.14 ###
- Fixed a bug that could lead to a crash in the graph_ops INSERT command
when run on a graph that contains no reconstructed sequence.
- Fixed a bug that could cause the graph_ops SCAFLNK command to find and
report suboptimal scaffold connections.
- Fixed a bug in the graph_ops CUTND command that could lead to
unintentional movement of mate-pair edges when the "include" parameter is
omitted. Also removed a previous limitation in CUTND that silently
prohibited moving edges to newly created nodes when the mean mate-pair
position of the edge didn't fall within the region of the new node.
- Added a new feature to graph_ops CUTND: the 'auto_include' parameter,
which will automatically move all selected mate-pair edges with positions
that fall within the cut sequence, as though they had been individually
named in an 'include' parameter.
- Changed the name of the graph_ops CUTND parameter 'start' to 'begin'.
'start' still works as a synonym, but its use is deprecated.
- Changed the default value of the "iterate" parameter of the graph_ops
PLUCK command from 2 to 3. This makes the default behavior more stable,
as stub branches of length 3 do occasionally occur, but are difficult
to track down, leading to scaffolds that are prematurely terminated.
PUSH already had an 'iterate' default value of 3, so this change also
makes the defaults of these complementary operations match each other.
- Documentation improvements.
### 0.4.13 ###
- Fixed a bug in ref_select that could cause the program to hang when
writing FASTQ output files on systems with only 1 or 2 processor cores, or
when the option --num_threads=2 was used.
- Fixed another case where the graph_ops PLUCK command could completely
remove a scaffold with a small number of contigs. Specifically, nodes with
two or more "in" edges and zero "out" edges (or the reverse) are no longer
plucked away.
- Documentation fixes.
### 0.4.12 ###
- Fixed bug that could cause crashes in graph_ops after use of the SELND
command.
- Fixed bug in graph_ops that caused PLUCK PLUCK PUSH PUSH to produce
unintended behavior different from PLUCK {iterate:2} PUSH {iterate:2}.
- Fixed a bug in ref_select that caused iterative reconstructions of the
same reference sequence to be shifted in position by one base per iteration.
- Added a new parameter --gap=<n> to the seq_scaffold command that controls
the base number of 'n's (ambiguous bases) to insert into remaining gaps
between contigs within a scaffold. The default remains unchanged at 15nt.
- Formerly, the graph_ops PLUCK command would remove both contigs of a two
node "scaffold", since each is technically a leaf. PLUCK would also remove
a single, unconnected contig on multiple PLUCK iterations. It no longer does
these things because they can lead to valid scaffolds with low contig counts
being "eroded away" and lost from subsequent analyses.
- A new parameter was added to graph_ops PLUCK: "min_len" prevents PLUCK from
removing unconnected nodes containing more than a minimum amount of sequence,
even on the first iteration.
- Added the 'sequences' parameter to the SELND and SELCC graph_ops commands.
This option is the same as the prior 'sequence' parmeter, except that it
accepts a list of sequences to match, (rather than just one).
- The graph_ops CUTND command now accepts negative indices for 'start' and
'end' parameters, representing offsets from the end of the sequence, rather
than the start. It is also more permissive with the provided coordaintes,
allowing negative start positions before the start of the sequence and end
positions after the end. This allows for trimming sequences to a maximum
length from either end without regard for sequence length.
- The graph_ops SCRIPT command CLI is now more permissive about the formatting
of the JSON parameter lists given for commands. They may now be quoted (to
permit cut/paste from bash command lines) and they may contain whitespace
between the JSON elements (which is permitted in quoted bash parameters).
- Documentation improvements.
### 0.4.11 ###
- The graph_ops INSERT command now has two new parameters which are enabled by
default. dup_kmer and dup_thresh control the k-mer length and a fractional
threshold used to determine if a contig about to be inserted between to
pre-existing scaffold contigs contains any substantial duplications (e.g. it is
a variant assembly) of contigs already in the scaffold. This can be a problem
for metagenomic assemblies where a few variant SNPs occurring near the end of a
contig can cause the contig assembler to produce an alternate assembly for that
region. In this case, the mate pairing information may incorrectly direct that
sequence into the gap nearest to where the other variant is already integrated
in an existing contig. These changes to INSERT detect and avoid most such
situations.
- The graph_ops SELND command now has the same "sequence" selection parameter
that was previously added to the SELCC command.
- seq_scaffold now attempts to find direct sequence overlaps to fill the gap
between neighboring contigs before attempting to heal the gap using contigs
from an alternate assembly. This change causes seq_scaffold to avoid some
situations where an erroneous alternate assembly can cause sequence
duplication into a gap that could be filled via a simple end-overlap.
- Improved the diagnostic (--verbose) output of seq_scaffold for gaps "healed"
with sequence from alternate assembly contigs.
- Fixed a bug that caused seq_scaffold to crash when all input scaffold
sequences are largely ambiguous.
- In ref_select, combination of --per_base and --mate_pair now implicitly
turn on --mp_inserts functionality; previously this needed to be done
manually, even though the above combination depends on enabling --mp_inserts
to work correctly.
- Improved the test_data set and included a script to regenerate the lambda
phage contigs and SAM format alignment files used to validate a SEAStAR build
via the "make test" CTest cases.
- Documentation improvements
### 0.4.10 ###
- Fixed a bug in ref_select where text in FASTQ read headers after the first
space was not stripped away. In reads with header text after the first space
this would prevent correct read ID matching and would affect reference sequence
reconstruction and matching/non-matching read output.
- Documentation improvements
### 0.4.9 ###
- Added a new parameter to ref_select and changed the default behavior when
determining which potential edges to write to the assembly graph. Previously,
only edges scoring 250.0 bits or greater were written by default. A new
parameter: --mp_mate_cnt has been introduced to complement the existing
parameter --mp_mate_lim. It has a default value of 2, and it is the
minimum number of read-pairs that must be represented in an edge for it
to be written (regardless of bitscore). With this change, the default value
of --mp_mate_lim is now 0.0. Many more edges will now be output by default,
although scripts written to use --mp_mate_lim will continue to work correctly,
with the slight change that they will no longer receive edges that are backed
by only a single read-pair. The exact previous behavior can be obtained by
adding --mp_mate_cnt=1 to the ref_select parameters.
- In conjuction with the above change, the JSON assembly graph file now
includes the read count for each output edge recorded in a new member called
"num".
- Fixed a bug in seq_scaffold that could cause it to crash when a contig
with fewer than 20 unambiguous bases was encountered.
- Documentation fixes and additions.
### 0.4.8 ###
- Added the new "shift" option to the graph_ops SELCC and SELCLUST commands
to enable smoother iterative constucts in script.go files. This provides a
shorthand way to keep all but the first clust or cc, (like range [1,-1],
but without generating an error when called with only one clust or cc
remaining.
- Added a new "sequence" option to the graph_ops SELCC command that allows
selecting one or more connected components based on the present of an exact
matching DNA subsequence (or its reverse compliment)
- Added a feature in graph_ops allowing scaffolds to survive use of SELCLUST
command with the "exclusive" option set to "true". Previously, rerunning
the SCAFF command was necessary in this case.
- Fixed a bug in the graph_ops SCRIPT command option "tag" which prevented
the tag value from propagating into recursive SCRIPT commands as intended.
- Fixed a bug in graph_ops that could cause crashes in the RELINK command
when invoked with the "complete" option set to true.
- Fixed bugs in graph_ops that caused corruption of STASHed graphs when an
invalid range of values was provided to the SELCC or SELCLUST commands.
- Fixed a bug in the graph_ops CLUST command that caused tetracalc to fail
when multiple commandline options were provided via the "options" parameter.
- Fixed a bug in the graph_ops FASTA command that could cause additional
problems if filesystem errors were encountered in writing the output data.
- Fixed a bug in seq_scaffold that would cause an error at the end of writing
to stdout on some platforms.
- Fixed a bug in trimfastq that incorrectly counted mate-paired reads with
--only_mates.
- Upgraded included zlib to version 1.2.8
- Fixed a few typos in documentation and help strings.
### 0.4.7 ###
- Added the "tag" parameter to the graph_ops SCRIPT command, allowing for .go
script files to be more easily reused, by enabling them to write output files
with different filenames and/or into different directory paths.
- Fixed a bug in RDP_tree_dev.coffee that did not properly handle pre-existing
incertae_sedis nodes.
- Fixed a bug in RDP_go that may caused it to exit with an error if a vmem
limit was already set before running the script.
- nodejs version 0.10 or higher is now required to run graph_ops, seq_scaffold
and other SEAStAR tools requiring command-line Javascript support.
- Fixed a bug that occasionally produced truncated files when using gzip
compressed output in graph_ops when run using nodejs version 0.10.x.
- Fixed assorted typos in help strings and documentation.
### 0.4.6 ###
- Fixed bugs in graph_ops that on rare occasions led to "stack overflow" errors
and/or invalid JSON output files when using the DUMP command with very large
sequence graph datasets.
- Adjusted the "Garbage Collection" settings used with nodejs to improve
performance and work around a rare case where memory fragmentation caused node
to allocate far more memory than required and become unstable.
### 0.4.5 ###
- Fixed a bug in ref_select that prevented proper sequence reconstruction when
reads were provided that lacked /1 or /2 suffixes on the read ids. This affects
Illumina reads and SOLiD reads prepared using solid2fastq with the -x option.
- Fixed issues in graph_ops which could cause poor performance or out of memory
errors when loading or saving very large JSON files, even when there is plenty
of memory available. With this change, it is recommended for performance reasons
that SEAStAR be run with node.js version 0.10 or higher.
- Misc documentation fixes and enhancements
### 0.4.4 ###
- Added new options to trimfastq supporting the output of FASTA format files and
adding more flexibility to the --mates_file output functionality. Also optimized
the case where -p = 0.0, to enable very efficient file format conversion using
--fasta and --mate_file when no additional trimming is desired.
- Fixed bug in fastq_nodup which led to corruption of the read headers in .single
output files for colorspace reads.
- All *.awk scripts in the /bin directory now have correct shebang lines, so they
may be located and run through the PATH.
- Added new script fastq_merge_csfasta.awk to pull modified csfasta read data back
into the originating fastq file. This is useful when using the SAET read error
correction tool, which only supports csfasta format files.
- Documentation fixes and updates reflecting the above changes.
### 0.4.3 ###
- General documentation improvements and fixed typos and formatting problems.
- Fixed a status printing bug in trimfastq
- Fixed bug in trimfastq Velvet paired interleaved output where wrong mate was reversed.
### 0.4.2 ###
- Documentation improvements; fixed typos and formatting problems when read on Github.
- Added the new "free" parameter to the graph_ops UNSTASH command.
- Fixed a segfault in ref_select when input SAM files contain no aligned reads.
- Fixed build problems in XCode projects.
- solid2fastq automatically checks for barcode strings of "default" or "Default".
### 0.4.1 ###
- Fixed typos and broken links in the documentation.
- Added this change log.
### 0.4.0 ###
- solid2fastq: Reversed the meaning of "read1" and "read2" file names for
mate-pairs. "read1" is the first read in a pair and "read2" is the second
read, 5' to 3'.
- solid2fastq: Added support for new native SOLiD file read type "F5-RNA" and
variations on "QV" placement in quality files.
- solid2fastq: Added -x parameter to turn of read ID suffix of "/1" and "/2".
- trimfastq: Removed -c parameter. This parameter toggled dinucleotide error
probability calculation. Now trimfastq automatically detects if a read set is
colorspace or nucleotide and chooses the appropriate error probability
calculation method.
- fastq_nodup: Fixed bug where index lengths of 32 would cause a segfault.
- fastq_nodup: Fixed bug where too many singlets were removed.
- Added new programs ref_select, tetracalc, graph_ops, and many other utility
scripts.