Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final capture replace filtering #36

Merged
merged 10 commits into from
Aug 10, 2022
Merged

Commits on Aug 3, 2022

  1. Configuration menu
    Copy the full SHA
    fe2eb2e View commit details
    Browse the repository at this point in the history
  2. Solved issue 34

    Add a command-line option to exclude "unknown protein" or "unknown sequence family" from results
    asishallab committed Aug 3, 2022
    Configuration menu
    Copy the full SHA
    821ed00 View commit details
    Browse the repository at this point in the history
  3. Solved issue 33

    Currently, phrases, i.e. sub-sets of candidate descriptions, that only consist of non-informative words are still scored and might be assigned as the final annotation. This should be changed.
    
    If the set of informative words is empty, classify the protein or sequence family as "unknown".
    asishallab committed Aug 3, 2022
    Configuration menu
    Copy the full SHA
    dd968c5 View commit details
    Browse the repository at this point in the history
  4. Updated the manual to include the new option

    The new option is '-x'
    asishallab committed Aug 3, 2022
    Configuration menu
    Copy the full SHA
    9ae6196 View commit details
    Browse the repository at this point in the history

Commits on Aug 4, 2022

  1. Configuration menu
    Copy the full SHA
    5aa49d2 View commit details
    Browse the repository at this point in the history
  2. Improved filtering of Blast stitles

    - Replaced standard Rust regex with fancy-regex in capture-replace-pairs, thus
      allowing for (named) backreferences.
    - Using named backreferences multiple occurrences of the same word are replaced
      with the first occurrence, i.e. any subsequent occurrence is deleted.
    asishallab committed Aug 4, 2022
    Configuration menu
    Copy the full SHA
    4d0bebb View commit details
    Browse the repository at this point in the history

Commits on Aug 5, 2022

  1. Added option to polish HRDs

    - Polishing iteratively applies capture-replace-pairs (fancy-regex, replace instruction)
      to the human readable descriptions (HRDs) assigned to the queries (families or proteins)
    - This is used among others to remove terminal non-informative words like '[...] and', or
      [...] the', or '[...] or' etc.
    - The polishing step can be suppressed (skipped) by providing the new command line option
      --polish-capture-replace-pairs (-d) with "none". Use the same option to provide
      custom capture-replace-pairs.
    asishallab committed Aug 5, 2022
    Configuration menu
    Copy the full SHA
    a482951 View commit details
    Browse the repository at this point in the history
  2. Debugged polishing

    - problem was mutability and iteration over mutable references...
    asishallab committed Aug 5, 2022
    Configuration menu
    Copy the full SHA
    1628f17 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    64961fa View commit details
    Browse the repository at this point in the history

Commits on Aug 6, 2022

  1. Updated README and included example

    Example polish-capture-replace-pairs file in misc.
    asishallab committed Aug 6, 2022
    Configuration menu
    Copy the full SHA
    5240052 View commit details
    Browse the repository at this point in the history