Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Subtask] Fix duplicate-code Pylint rule for client-python #3767

Open
noidname01 opened this issue Jun 4, 2024 · 6 comments
Open

[Subtask] Fix duplicate-code Pylint rule for client-python #3767

noidname01 opened this issue Jun 4, 2024 · 6 comments
Labels
good first issue Good for newcomers subtask Subtasks of umbrella issue

Comments

@noidname01
Copy link
Collaborator

Describe the subtask

  • Currently Pylint thinks our code has several duplicate code in different files, please try to refactor them. If you think there is no need to refactor duplicate code, please explain why and we can disable this rule to take effects.

Parent issue

#3560

@noidname01 noidname01 added the subtask Subtasks of umbrella issue label Jun 4, 2024
@noidname01 noidname01 changed the title [Subtask] Fix duplicate-code Pylint rule for client-python subtask [Subtask] Fix duplicate-code Pylint rule for client-python Jun 4, 2024
@xunliu xunliu added the good first issue Good for newcomers label Jun 4, 2024
@ian910297
Copy link
Contributor

I’m interested in working on this issue.

I ran an initial pylint test. It looks like there are 20 issues to fix. I will submit them in batches.

@noidname01
Copy link
Collaborator Author

Hi @ian910297, thanks for your interest in this issue. If you have any question, feel free to comment and let us know👍

@ian910297
Copy link
Contributor

@noidname01 I've outlined some scenarios, but I'd like to discuss them before writing the code.

  1. exception
    Exception declarations are scattered across multiple files. I want to consolidate them into a single file, base.py, located at clients/clitent-python/gravitino/exceptions/base.py. There are also identically named exception classes in gravitino.client.gravitino_client and gravitino.client.gravitino_metalake, which I will also include in base.py.

reference: [SPARK-42342][PYTHON][CONNECT] Introduce base hierarchy to exceptions #39882

  1. Unit and integration tests are allowed to have duplicate code in specific files.
audit_dto = AuditDTO(
            _creator="test",
            _create_time="2022-01-01T00:00:00Z",
            _last_modifier="test",
            _last_modified_time="2024-04-05T10:10:35.218Z",
        )
  1. no idea: how to refactor the getter function of private variables
    This code doesn't look very Pythonic. If we want to remove the duplication, we may need to rewrite it more systematically.
    @dataclass
    class SetProperty:
        """A fileset change to set the property and value for the fileset."""

        _property: str = field(metadata=config(field_name="property"))
        _value: str = field(metadata=config(field_name="value"))

        def property(self):
            """Retrieves the name of the property being set in the fileset.

            Returns:
                 The name of the property.
            """
            return self._property

@noidname01
Copy link
Collaborator Author

noidname01 commented Jul 4, 2024

Hi @ian910297, thanks for your detailed descriptions.

  1. About exception, actually I've created an issue recently to refactor current error handling([Improvement] Refactor Error Handling in client-python #4012).
    It seems that exception structure is related to many issues, like [Subtask] Fix broad-exception-caught Pylint rule for client-python #3766, so maybe we can solve issue 4012 first? WDYT?

  2. I don't really get what it means, can you give more context?

  3. Yeah, I've also noticed this problem, because some of code in client-python were based on the logic in client-java, you'll see some coding style is more like Java, not Pythonic. I think we can create another subtask issue to discuss about how to rewrite this part.

@ian910297
Copy link
Contributor

Hi @noidname01

  1. exception
    I agree that we should prioritize resolving issue [Improvement] Refactor Error Handling in client-python #4012.

  2. unit test
    Pylint flags the following structure as duplicate code but fixed values expected in unit test.

AuditDTO(
            _creator="test",
            _create_time="2022-01-01T00:00:00Z",
            _last_modifier="test",
            _last_modified_time="2024-04-05T10:10:35.218Z",
        )

I've considered two solutions:

  • (Simpler) Disable duplicated code detection for specific file / directory.
  • (More complex) Use factory / builder design pattern to create object for test. Considering the potential increase in tests, it's crucial to evaluate the scalability and reusability of the chosen approach.
  1. pythonic code
    I agree that we should create a separate subtask for this, similar to case 1.

In summary, I believe addressing the duplicated code linting issue is not a priority at this stage. We should focus on reviewing the errors to identify potential architectural improvements.

@noidname01
Copy link
Collaborator Author

@ian910297 I think the reason for Problem 2 is that we use the same test component many times and create them in different files, maybe give them different values can solve the problem?

But the problem that every test repeats the same init logic still remains, I think we can move the same logic to parent class (ex. IntegrationTestEnv in integration tests) to prevent duplicate code, WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers subtask Subtasks of umbrella issue
Projects
None yet
Development

No branches or pull requests

3 participants