-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CodeCamp #116 Add SROIE to dataset preparer #1639
CodeCamp #116 Add SROIE to dataset preparer #1639
Conversation
if '*' in src.split('/')[-1] and not osp.exists(dst): | ||
os.makedirs(dst) | ||
for f in glob.glob(src): | ||
if osp.exists(f): | ||
shutil.move(f, dst) | ||
continue | ||
|
||
if osp.exists(src) and not osp.exists(dst): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines can be optimized to be more general. The if
condition is too strict and may only be applicable to this dataset.
- Wildcard (*) does not usually appear in the file path, and therefore we can use
glob
to match all the files whenever there is*
in the file path. mmengine
has a utilmkdir_or_exist
to create the directory whenever needed.- Is line 156 really necessary? I think
glob.glob
should return a list of existing files. - Using continue to skip the next
if
block doesn't make the logic flow very clear. I'm in favor ofif...elif...
here
To conclude, the code can be rewritten as:
import mmengine
#...
if '*' in src:
mmengine.mkdir_or_exist(dst)
for f in glob.glob(src):
shutil.move(f, dst)
elif osp.exists(src) and not osp.exists(dst):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestions and I have modified this code block. But with #1639 (comment) fixed, this reports error like 'path already exists' when running task-recog after task-det. Keeping the not osp.exists(dst)
in if
statement will be fine. So it's like this
if '*' in src and not osp.exists(dst):
mkdir_or_exist(dst)
for f in glob.glob(src):
shutil.move(f, dst)
elif osp.exists(src) and not osp.exists(dst):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem to address the key issue. Such an error occurs because shutil
will check the existence of the destination file. If dst
is just a directory path, shutil
will do the check on the resulting filename, which is equivalent to osp.exists(osp.join(dst, osp.basename(src))
.
Back to your implementation: ... and not osp.exists(dst)
only verifies the existence of the directory instead of a specific file. Suppose that we have placed an empty directory to dst
, this implementation will still skip moving all the files inside.
Therefore, it's better to just check the existence of each file with osp.exists(osp.join(dst, osp.basename(f))
and only skip those existing ones.
Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
…yHuang/mmocr into ferry/add_SROIE_to_dataset_preparer
|
||
|
||
@DATA_PARSERS.register_module() | ||
class SROIETextDetAnnParser(BaseParser): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the difference with ICDARTextDetAnnParaser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only difference is the try-except structure added to handle unicode errors that I want to skip. Should I merge this into the ICDARTextDetAnnParser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good overall, but does not pass our lint test.
To fix it, please install pre-commit hooks following
https://mmocr.readthedocs.io/en/dev-1.x/notes/contribution_guide.html#commit-your-changes
and then run pre-commit run --all-files
, and it will help you fix the format issues.
Also, there is a conflict needed to be resolved before I merge it into dev-1.x:
You can fix it locally by running git merge dev-1.x
, then push the latest changes to your branch.
Conflicts: mmocr/datasets/preparers/parsers/__init__.py resolve conflicts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I've updated the downloading link for SROIE, since the google drive links doesn't work.
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
CodeCamp-MMOCR-116
Modification
Support SROIE dataset (text detection, text recognition, text spotting) on the DatasetPreparer
BC-breaking (Optional)
Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
Use cases (Optional)
If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.
Checklist
Before PR:
After PR: