-
-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genbank parser rewrite #437
base: main
Are you sure you want to change the base?
Conversation
wantErr bool | ||
}{{ | ||
name: "parses example header from spec", | ||
data: `GBBCT1.SEQ Genetic Sequence Data Bank |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the cases where wantErr
is nil, perhaps it could be helpful to define data
in terms of want.
So for instance, Header
could have a method like
func (h Header) bytes() []bytes {
strbuilder := strings.Builder{}
for _, r := range h.FileName {
strbuilder.WriteRune(r)
}
for i := 0; i < paddingBeforeTitle-len(h.FileName); i++ {
strbuilder.WriteByte(" "[0]) // maybe hard code some common byte values for these :shrug:
}
for _, r := range h.Title {
strbuilder.WriteRune(r)
}
strbuilder.WriteByte("\n"[0])
// etc.
}
for _, tc := range testcases {
t.Run(tc.name, func(t *testing.T) {
var p *Parser
if data == "" {
p = NewParser(strings.NewReader(tc.want)
} else {
p = NewParser(strings.NewReader(tc.data))
}
got, err := p.parseHeader()
if gotErr := err != nil; gotErr != tc.wantErr {
t.Fatalf("incorrect error returned from parseHeader, wantErr: %v, err: %v", tc.wantErr, err)
}
if diff := cmp.Diff(tc.want, got); !tc.wantErr && diff != "" {
t.Fatalf("parseHeader returned incorrect header, (-want, +got): %v", diff)
}
})
}
This should do 2 very nice things:
- Contributors don't need to to spend time aligning multi line string literals in accordance to the spec. All they have to do is define a Struct. More complex structs of the Genbank could have some constructors with some nice defaults to help build up the test data if it is too complex to define the struct in place or if we find that it takes too much boiler plate to define over and over again. This should make it easier to add and reason about test data.
- This gets us part of the way to implementing writing to disk!
This is probably overkill for header and locus, but maybe good to keep in mind for the future. Having golden tests would be super helpful once we're ready to test some complete Genbank files. This would make it trivial to add test cases and work through any problems if bugs reported.
Also, maybe it's good to have a few different test tables instead of the one for header? That way the test case itself is a little simpler. For instance, TestParseHeaderHappy
and TestParseHeaderSad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points. We need write ability - makes sense to implement this along the way so we can just do round-trip tests! Will go back and do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest commit implements these suggestions for the header:
- separate "happy" and "sad" tests into
TestXXXRoundtrip
andTestXXXErr
- make tests roundtrip instead
Still need to do this for the rest of the spec though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm! I'd say ship it!
This PR has had no activity in the past 2 months. Marking as |
This PR has had no activity in the past 2 months. Marking as |
Changes in this PR
Rewrite of the Genbank parser. Currently in a draft state.
Why are you making these changes?
See #434 .
Are any changes breaking? (IMPORTANT)
Yes. Changes to the structure and methods of the
Genbank
struct, as well as the API for the parser.Pre-merge checklist
All of these must be satisfied before this PR is considered
ready for merging. Mergeable PRs will be prioritized for review.
primers/primers_test.go
for what this might look like.CHANGELOG.md
in the[Unreleased]
section.