Skip to content

fix: Knowledge base retrieval segmentation sorted by relevance #2791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 3, 2025

Conversation

shaohuzhang1
Copy link
Contributor

fix: Knowledge base retrieval segmentation sorted by relevance

Copy link

f2c-ci-robot bot commented Apr 3, 2025

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

f2c-ci-robot bot commented Apr 3, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@shaohuzhang1 shaohuzhang1 merged commit 1eccb54 into main Apr 3, 2025
4 checks passed
@shaohuzhang1 shaohuzhang1 deleted the pr@main@fix_dataset_search branch April 3, 2025 02:57
@@ -88,7 +88,7 @@ def execute(self, dataset_id_list, dataset_setting, question,
'is_hit_handling_method_list': [row for row in result if row.get('is_hit_handling_method')],
'data': '\n'.join(
[f"{reset_title(paragraph.get('title', ''))}{paragraph.get('content')}" for paragraph in
paragraph_list])[0:dataset_setting.get('max_paragraph_char_number', 5000)],
result])[0:dataset_setting.get('max_paragraph_char_number', 5000)],
'directly_return': '\n'.join(
[paragraph.get('content') for paragraph in
result if
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two minor issues in the given code:

  1. At line 86, there is an extra closing parenthesis } after 'is_hit_handling_method' which seems unintended.

  2. In line 94, you have another unexpected closing parenthesis { at the end of the list comprehension for extracting paragraphs.

To fix these, modify lines 86 and 94 as follows:

'directly_return': ''.join(
                [paragraph.get('content') for paragraph in result if
                 paragraph.get('hit_status', 'False') == 'True']),

These changes ensure that both conditions are tested before appending to the final string, and they also corrects syntax errors due to unnecessary parentheses. This should improve readability and possibly function correctly depending on how result is structured. However, please review the rest of your code to ensure it addresses all functionality requirements without introducing additional bugs or inefficiencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant