Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genbank parser rewrite #437

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

carreter
Copy link
Collaborator

Changes in this PR

Rewrite of the Genbank parser. Currently in a draft state.

Why are you making these changes?

See #434 .

Are any changes breaking? (IMPORTANT)

Yes. Changes to the structure and methods of the Genbank struct, as well as the API for the parser.

Pre-merge checklist

All of these must be satisfied before this PR is considered
ready for merging. Mergeable PRs will be prioritized for review.

  • New packages/exported functions have docstrings.
  • New/changed functionality is thoroughly tested.
  • New/changed functionality has a function giving an example of its usage in the associated test file. See primers/primers_test.go for what this might look like.
  • Changes are documented in CHANGELOG.md in the [Unreleased] section.
  • All code is properly formatted and linted.
  • The PR template is filled out.

wantErr bool
}{{
name: "parses example header from spec",
data: `GBBCT1.SEQ Genetic Sequence Data Bank
Copy link
Contributor

@TwFlem TwFlem Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the cases where wantErr is nil, perhaps it could be helpful to define data in terms of want.

So for instance, Header could have a method like

func (h Header) bytes() []bytes {
  strbuilder := strings.Builder{}
  for _, r := range h.FileName {
     strbuilder.WriteRune(r)
  }
  for i := 0; i < paddingBeforeTitle-len(h.FileName); i++ {
      strbuilder.WriteByte(" "[0]) // maybe hard code some common byte values for these :shrug: 
   }
  for _, r := range h.Title {
     strbuilder.WriteRune(r)
  }
  strbuilder.WriteByte("\n"[0])
  // etc.
}


	for _, tc := range testcases {
		t.Run(tc.name, func(t *testing.T) {
                        var p *Parser
                        if data == "" {
                            p = NewParser(strings.NewReader(tc.want)
                       } else {
			    p = NewParser(strings.NewReader(tc.data))
                       }
			got, err := p.parseHeader()

			if gotErr := err != nil; gotErr != tc.wantErr {
				t.Fatalf("incorrect error returned from parseHeader, wantErr: %v, err: %v", tc.wantErr, err)
			}

			if diff := cmp.Diff(tc.want, got); !tc.wantErr && diff != "" {
				t.Fatalf("parseHeader returned incorrect header, (-want, +got): %v", diff)
			}

		})
	}

This should do 2 very nice things:

  1. Contributors don't need to to spend time aligning multi line string literals in accordance to the spec. All they have to do is define a Struct. More complex structs of the Genbank could have some constructors with some nice defaults to help build up the test data if it is too complex to define the struct in place or if we find that it takes too much boiler plate to define over and over again. This should make it easier to add and reason about test data.
  2. This gets us part of the way to implementing writing to disk!

This is probably overkill for header and locus, but maybe good to keep in mind for the future. Having golden tests would be super helpful once we're ready to test some complete Genbank files. This would make it trivial to add test cases and work through any problems if bugs reported.

Also, maybe it's good to have a few different test tables instead of the one for header? That way the test case itself is a little simpler. For instance, TestParseHeaderHappy and TestParseHeaderSad

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. We need write ability - makes sense to implement this along the way so we can just do round-trip tests! Will go back and do this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest commit implements these suggestions for the header:

  • separate "happy" and "sad" tests into TestXXXRoundtrip and TestXXXErr
  • make tests roundtrip instead

Still need to do this for the rest of the spec though!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! I'd say ship it!

io/genbank/parser.go Outdated Show resolved Hide resolved
io/genbank/parser.go Outdated Show resolved Hide resolved
io/genbank/parser.go Outdated Show resolved Hide resolved
@carreter carreter changed the title Beginning of rewrite. Can parse header and LOCUS. Genbank parser rewrite Jan 15, 2024
Copy link

This PR has had no activity in the past 2 months. Marking as stale.

@github-actions github-actions bot added the stale label Mar 18, 2024
@github-actions github-actions bot removed the stale label Apr 30, 2024
Copy link

This PR has had no activity in the past 2 months. Marking as stale.

@github-actions github-actions bot added the stale label Jun 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants