Skip to content

fix: Missing parameters such as character length when creating a knowledge base #2827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 8, 2025

Conversation

shaohuzhang1
Copy link
Contributor

fix: Missing parameters such as character length when creating a knowledge base

Copy link

f2c-ci-robot bot commented Apr 8, 2025

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

f2c-ci-robot bot commented Apr 8, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@shaohuzhang1 shaohuzhang1 merged commit 3557ea5 into main Apr 8, 2025
4 checks passed
@shaohuzhang1 shaohuzhang1 deleted the pr@main@fix_create_dataset branch April 8, 2025 09:24
'document_list': document_model_list,
"document_count": len(document_model_list),
"char_length": reduce(lambda x, y: x + y, [d.char_length for d in document_model_list],
0)}, dataset_id

@staticmethod
def get_last_url_path(url):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly! Here are some observations and improvements for the provided code:

  1. Code Structure: The function save has multiple responsibilities—saving, handling associations, and constructing the response—but it's split into two separate functions save_with_association and get_response. This can be refactored to merge these functionalities where appropriate.

  2. Bulk Insertion Logic: You're correctly using QuerySet.bulk_create, but consider adding a try-except block around it to handle any potential errors during bulk insertion, such as database constraint violations or other exceptions.

  3. Response Data:

    • Add checks to ensure that document_model_list is not empty before accessing its attributes like char_length.
    • Use type hints (Dict) explicitly where possible to clarify the expected types of inputs and outputs.
  4. User ID Handling: Ensure that user_id is properly set when returning the response.

Here’s an improved version of the code based on the above points:

from functools import reduce

def save(self, instance: Dict, with_valid=True):
    # Assume DataSetSerializers and ProblemParagraphMapping exist elsewhere in the code
    dataset_serializers = DataSetSerializers(instance)
    dataset_responses = {
        'id': instance.get('id')
    }

    problems = None
    if with_valid:
        problems = self._create_problems_with_relation(instance)

    user_id = None  # Define this variable somewhere in your class or pass it via arguments
    document_mappings = self._generate_document_mappings(instance, with_valid)

    if problems:
        dataset_responses.update({
            'problems': problems,
            'document_relations': document_mappings,
            'dataset_list': DocumentSerializers.Query(data={'dataset_id': instance['id']}).list(with_valid=True)}
        )
    
    return dataset_responses

@staticmethod
def _create_problems_with_relation(dataset_data: dict) -> list:
    problems = []
    # Implement logic to create new problems and their mappings here
    return problems

@staticmethod
def _generate_document_mappings(dataset_data: dict, with_valid=True) -> List[dict]:
    document_models = []  # Fetch all document models related to the dataset
    
    if with_valid:
        mapping_list = []
        for doc in document_models:
            mapping_dict = {'doc_id': doc.id, 'dataset_id': doc.dataset_id}
            mapping_list.append(mapping_dict)
        
        QuerySet(ProblemParagraphMapping).bulk_create(mapping_list)
    
    for doc in document_models:
        document_responses = {
            'id': str(doc.id),
            'name': doc.name,
            'description': doc.description,
            'user_id': user_id,  # Assuming user_id is defined somewhere else
            'paragraphs': doc.paragraphs
        }
        document_responses['size'] = sum(len(x) for x in [resp["description"] for resp in (document_responses)] * len([x.split("\n") for x in document_responses]))
        document_requests = ResponseDocumentsSerializer.create_instances(
                data=[docResponses]
                )[0]  
        document_responses.pop("paragraphs")
        return document_requests

@staticmethod
def save_with_association(instance: Dict, with_valid=True):
    problem_paragraph_mapping_list = [
        ProblemParagraphMapping(...) for ... in ...
    ]
    if len(problem_paragraph_mapping_list) > 0:
        BulkCreateManager.objects.bulk_create(problem_paragraph_mapping_list)

@staticmethod
def get_response(data: dict, dataset_id, with_valid=True) -> Dict[str, Any]:
    return {
        **DataSetSerializers(data).data,
        'document_list': DocumentListSerializer.data(data),
        "document_count": len(DocumentListSerializer.data(data)),
        "char_length": sum(reduce(lambda x, y: x + y, [d.char_length for d in Data]), 0)}, dataset_id

These changes aim to improve clarity, maintainability, and robustness of the original code. Adjustments might depend heavily on actual context and classes used within the surrounding environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant