Skip to content

Data pipeline refactoring #300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

hadipash
Copy link
Collaborator

Thank you for your contribution to the MindOCR repo.
Before submitting this PR, please make sure:

Motivation

Refactored data pipeline to match best MindData practices, including:

  1. Use GeneratorDataset for data loading only.
  2. Use dataset.map operation to apply data transformations and augmentations.
  3. Reduce number of Python transformations by grouping them into a single operation.
  4. Group MindSpore operations as well.
  5. Move to MindSpore operations where it is possible (Decode, Normalize, HWC2CHW).
  6. Integrate MindRecord support.

@zhtmike
Copy link
Collaborator

zhtmike commented May 22, 2023

I will solve SVTR First and move to this

@zhtmike zhtmike mentioned this pull request May 23, 2023
4 tasks
hadipash added a commit that referenced this pull request Jun 2, 2023
- Fixed bug that caused evaluation crash if no postprocessing is set.
- Fixed `DetBasePostprocess` class import from outside and refactored it.
- Moved training condition check to initialization in `RandomScale` as this condition needs to be checked once only.
- Reverted linting in `mindocr/data/builder.py` to avoid complicated merge conflicts with the new pipeline (#300).
- Set `shape_list` as numpy array throughout the project.
- Removed polygons clipping in `DBPostprocess` to keep all the clippings in one place (`DetBasePostprocess`).
@hadipash
Copy link
Collaborator Author

Rebased onto the main branch.

@hadipash
Copy link
Collaborator Author

Re-opened in #416 .

@hadipash hadipash closed this Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants