Skip to content

Conversation

@kuk0
Copy link

@kuk0 kuk0 commented Jun 23, 2021

see the comments in 48105b0
the two versions of isHeaderLine are not equivalent and the commit broke reading correct BED files

@38
Copy link
Contributor

38 commented Jun 23, 2021

Why revert it? This is a critical optimization that makes it significantly faster.

@38
Copy link
Contributor

38 commented Jun 23, 2021

I guess we could fix the parser - of course it has bug.

And please note, the original version isn't a reliable parsing code as well. The point is we need to handle most of the case.

see arq5x@48105b0
that commit broke reading correct BED files
@kuk0 kuk0 changed the title partial revert of 48105b07 fix parsing headers 48105b07 Jun 24, 2021
@kuk0
Copy link
Author

kuk0 commented Jun 24, 2021

This is a critical optimization that makes it significantly faster.

ok, well, but how much faster? do you have a benchmark?
how many times is that function (isHeaderLine) called?
e.g., if i have a file with 1M lines, but the header is just the first line, i would expect it to be called ~ once or twice (then it would be unnecessary to optimize it at all); if it's called 1M times, is it necessary?

@kuk0
Copy link
Author

kuk0 commented Jun 24, 2021

i have updated the diff and tested it on this BED file:

chr	start	end	name	score	strand
NC_045512.2	0	29903	sars2	.	+

bedtools v2.29.2 can read this file
bedtools v2.30.0 is broken

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants