-
Notifications
You must be signed in to change notification settings - Fork 60
add ABINet [WIP, don't merge yet] #385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
31 commits
Select commit
Hold shift + click to select a range
bf6ae5b
Add ABINet 0616
da0840b
Add ABINet Modify README.md
ddc5718
Add ABINet 0616
c99f3d5
Add ABINet Modify README.md
a769356
Merge branch 'dev' into main
31d03ce
Add ABINet 0621
265691c
Add ABINet 0621
0089f47
Add ABINet 0622
90ee7f1
Merge branch 'main' into main
safeandnewYH 1fa8970
Add ABINet 0623
c1c81d1
Add ABINet 0624
2dd2d0b
Add ABINet 0624 add README_CN.md
3481004
Add ABINet 0626
e7751a4
Add ABINet 0626
92b9e8e
Add ABINet 0626
7215233
Add ABINet 0626
dcdf056
Add ABINet 0628
c8c2133
Add ABINet 0629
d5eafdb
Add ABINet 0629
e88dea7
Merge branch 'main' into main
safeandnewYH c8153db
Add ABINet 0630
34ae07d
Add ABINet 0630
13ed298
Add ABINet 0630
274beac
Update ABINet README.md
safeandnewYH aa3823a
Update ABINet README_CN.md
safeandnewYH 51456a3
Update ABINet README.md
safeandnewYH 92fd1d0
Update ABINet README_CN.md
safeandnewYH 2323870
Update README.md
HaoyangLee b3f30d1
Update README_CN.md
HaoyangLee 56b669e
Update rec_abinet_transforms.py
Songyuanwei 3b0c61e
Merge branch 'main' into main
Songyuanwei File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
system: | ||
mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore | ||
distribute: True | ||
amp_level: 'O0' | ||
seed: 42 | ||
log_interval: 100 | ||
val_while_train: False | ||
drop_overflow_update: False | ||
|
||
common: | ||
character_dict_path: &character_dict_path | ||
num_classes: &num_classes 37 | ||
max_text_len: &max_text_len 25 | ||
infer_mode: &infer_mode False | ||
use_space_char: &use_space_char False | ||
batch_size: &batch_size 96 | ||
|
||
model: | ||
type: rec | ||
pretrained : "./tmp_rec/pretrain.ckpt" | ||
transform: null | ||
backbone: | ||
name: abinet_backbone | ||
pretrained: False | ||
batchsize: *batch_size | ||
head: | ||
name: ABINetHead | ||
batchsize: *batch_size | ||
|
||
postprocess: | ||
name: ABINetLabelDecode | ||
|
||
metric: | ||
name: RecMetric | ||
main_indicator: acc | ||
character_dict_path: *character_dict_path | ||
ignore_space: True | ||
print_flag: False | ||
filter_ood: False | ||
|
||
loss: | ||
name: ABINetLoss | ||
|
||
|
||
scheduler: | ||
scheduler: step_decay | ||
decay_rate: 0.1 | ||
decay_epochs: 6 | ||
warmup_epochs: 0 | ||
lr: 0.0001 | ||
num_epochs : 10 | ||
|
||
|
||
optimizer: | ||
opt: adam | ||
|
||
|
||
train: | ||
clip_grad: True | ||
clip_norm: 20.0 | ||
ckpt_save_dir: './tmp_rec' | ||
dataset_sink_mode: False | ||
dataset: | ||
type: LMDBDataset | ||
dataset_root: path/to/data_lmdb_release/ | ||
data_dir: train/ | ||
# label_files: # not required when using LMDBDataset | ||
sample_ratio: 1.0 | ||
shuffle: True | ||
transform_pipeline: | ||
- ABINetTransforms: | ||
- ABINetRecAug: | ||
- NormalizeImage: | ||
is_hwc: False | ||
mean: [0.485, 0.456, 0.406] | ||
std: [0.485, 0.456, 0.406] | ||
# # the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize | ||
output_columns: ['image','label','length','label_for_mask'] #'img_path'] | ||
|
||
loader: | ||
shuffle: True # TODO: tbc | ||
batch_size: *batch_size | ||
drop_remainder: True | ||
max_rowsize: 128 | ||
num_workers: 20 | ||
|
||
eval: | ||
ckpt_load_path: ./tmp_rec/best.ckpt | ||
dataset_sink_mode: False | ||
dataset: | ||
type: LMDBDataset | ||
dataset_root: path/to/data_lmdb_release/ | ||
data_dir: evaluation/ | ||
# label_files: # not required when using LMDBDataset | ||
sample_ratio: 1.0 | ||
shuffle: False | ||
transform_pipeline: | ||
- ABINetEvalTransforms: | ||
- ABINetEval: | ||
# the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize | ||
output_columns: ['image','label','length','label_for_mask'] # TODO return text string padding w/ fixed length, and a scaler to indicate the length | ||
net_input_column_index: [0] # input indices for network forward func in output_columns | ||
label_column_index: [1, 2] # input indices marked as label | ||
|
||
loader: | ||
shuffle: False # TODO: tbc | ||
batch_size: *batch_size | ||
drop_remainder: False | ||
max_rowsize: 128 | ||
num_workers: 8 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,226 @@ | ||
""" | ||
transform for text recognition tasks. | ||
""" | ||
import copy | ||
import logging | ||
import random | ||
import re | ||
import warnings | ||
|
||
import cv2 | ||
import numpy as np | ||
import PIL | ||
import six | ||
from PIL import Image | ||
|
||
import mindspore.dataset as ds | ||
|
||
from ...models.utils.abinet_layers import CharsetMapper, onehot | ||
from .svtr_transform import ( | ||
CVColorJitter, | ||
CVGaussianNoise, | ||
CVMotionBlur, | ||
CVRandomAffine, | ||
CVRandomPerspective, | ||
CVRandomRotation, | ||
CVRescale, | ||
) | ||
|
||
_logger = logging.getLogger(__name__) | ||
__all__ = ["ABINetTransforms", "ABINetRecAug", "ABINetEval", "ABINetEvalTransforms"] | ||
|
||
|
||
class ABINetTransforms(object): | ||
"""Convert text label (str) to a sequence of character indices according to the char dictionary | ||
|
||
Args: | ||
|
||
""" | ||
|
||
def __init__( | ||
self, | ||
): | ||
# ABINet_Transforms | ||
self.case_sensitive = False | ||
self.charset = CharsetMapper(max_length=26) | ||
|
||
def __call__(self, data: dict): | ||
img_lmdb = data["img_lmdb"] | ||
label = data["label"] | ||
label = label.encode("utf-8") | ||
label = str(label, "utf-8") | ||
try: | ||
label = re.sub("[^0-9a-zA-Z]+", "", label) | ||
if len(label) > 25 or len(label) <= 0: | ||
string_false2 = f"len(label) > 25 or len(label) <= 0: {label}, {len(label)}" | ||
_logger.warning(string_false2) | ||
label = label[:25] | ||
buf = six.BytesIO() | ||
buf.write(img_lmdb) | ||
buf.seek(0) | ||
with warnings.catch_warnings(): | ||
warnings.simplefilter("ignore", UserWarning) | ||
image = PIL.Image.open(buf).convert("RGB") | ||
if not _check_image(image, pixels=6): | ||
string_false1 = f"_check_image false: {label}, {len(label)}" | ||
_logger.warning(string_false1) | ||
except Exception: | ||
string_false = f"Corrupted image is found: {label}, {len(label)}" | ||
_logger.warning(string_false) | ||
|
||
image = np.array(image) | ||
|
||
text = label | ||
|
||
length = len(text) + 1 | ||
length = float(length) | ||
|
||
label = self.charset.get_labels(text, case_sensitive=self.case_sensitive) | ||
label_for_mask = copy.deepcopy(label) | ||
label_for_mask[int(length - 1)] = 1 | ||
label = onehot(label, self.charset.num_classes) | ||
data_dict = {"image": image, "label": label, "length": length, "label_for_mask": label_for_mask} | ||
return data_dict | ||
|
||
|
||
class ABINetRecAug(object): | ||
def __init__(self): | ||
self.transforms = ds.transforms.Compose( | ||
[ | ||
CVGeometry( | ||
degrees=45, | ||
translate=(0.0, 0.0), | ||
scale=(0.5, 2.0), | ||
shear=(45, 15), | ||
distortion=0.5, | ||
p=0.5, | ||
), | ||
CVDeterioration(var=20, degrees=6, factor=4, p=0.25), | ||
CVColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.1, p=0.25), | ||
] | ||
) | ||
self.toTensor = ds.vision.ToTensor() | ||
self.w = 128 | ||
self.h = 32 | ||
|
||
def __call__(self, data): | ||
img = data["image"] | ||
img = self.transforms(img) | ||
img = cv2.resize(img, (self.w, self.h)) | ||
img = self.toTensor(img) | ||
data["image"] = img | ||
return data | ||
|
||
|
||
def _check_image(x, pixels=6): | ||
if x.size[0] <= pixels or x.size[1] <= pixels: | ||
return False | ||
else: | ||
return True | ||
|
||
|
||
class ABINetEvalTransforms(object): | ||
"""Convert text label (str) to a sequence of character indices according to the char dictionary | ||
|
||
Args: | ||
|
||
""" | ||
|
||
def __init__( | ||
self, | ||
): | ||
# ABINet_Transforms | ||
self.case_sensitive = False | ||
self.charset = CharsetMapper(max_length=26) | ||
|
||
def __call__(self, data: dict): | ||
img_lmdb = data["img_lmdb"] | ||
label = data["label"] | ||
label = label.encode("utf-8") | ||
label = str(label, "utf-8") | ||
try: | ||
label = re.sub("[^0-9a-zA-Z]+", "", label) | ||
if len(label) > 25 or len(label) <= 0: | ||
string_false2 = f"en(label) > 25 or len(label) <= 0: {label}, {len(label)}" | ||
_logger.warning(string_false2) | ||
label = label[:25] | ||
buf = six.BytesIO() | ||
buf.write(img_lmdb) | ||
buf.seek(0) | ||
with warnings.catch_warnings(): | ||
warnings.simplefilter("ignore", UserWarning) | ||
image = PIL.Image.open(buf).convert("RGB") | ||
if not _check_image(image, pixels=6): | ||
string_false1 = f"_check_image false: {label}, {len(label)}" | ||
_logger.warning(string_false1) | ||
except Exception: | ||
string_false = f"Corrupted image is found: {label}, {len(label)}" | ||
_logger.warning(string_false) | ||
|
||
image = np.array(image) | ||
|
||
text = label | ||
length = len(text) + 1 | ||
length = float(length) | ||
data_dict = {"image": image, "label": text, "length": length} | ||
return data_dict | ||
|
||
|
||
class ABINetEval(object): | ||
def __init__(self): | ||
self.toTensor = ds.vision.ToTensor() | ||
self.w = 128 | ||
self.h = 32 | ||
|
||
def __call__(self, data): | ||
img = data["image"] | ||
img = cv2.resize(img, (self.w, self.h)) | ||
img = self.toTensor(img) | ||
data["image"] = img | ||
length = data["length"] | ||
length = int(length) | ||
data["length"] = length | ||
return data | ||
|
||
|
||
class CVGeometry(object): | ||
def __init__(self, degrees=15, translate=(0.3, 0.3), scale=(0.5, 2.0), shear=(45, 15), distortion=0.5, p=0.5): | ||
self.p = p | ||
type_p = random.random() | ||
if type_p < 0.33: | ||
self.transforms = CVRandomRotation(degrees=degrees) | ||
elif type_p < 0.66: | ||
self.transforms = CVRandomAffine(degrees=degrees, translate=translate, scale=scale, shear=shear) | ||
else: | ||
self.transforms = CVRandomPerspective(distortion=distortion) | ||
|
||
def __call__(self, img): | ||
if random.random() < self.p: | ||
img = np.array(img) | ||
return Image.fromarray(self.transforms(img)) | ||
else: | ||
return img | ||
|
||
|
||
class CVDeterioration(object): | ||
def __init__(self, var, degrees, factor, p=0.5): | ||
self.p = p | ||
transforms = [] | ||
if var is not None: | ||
transforms.append(CVGaussianNoise(var=var)) | ||
if degrees is not None: | ||
transforms.append(CVMotionBlur(degrees=degrees)) | ||
if factor is not None: | ||
transforms.append(CVRescale(factor=factor)) | ||
|
||
random.shuffle(transforms) | ||
|
||
transforms = ds.transforms.Compose(transforms) | ||
self.transforms = transforms | ||
|
||
def __call__(self, img): | ||
if random.random() < self.p: | ||
img = np.array(img) | ||
return Image.fromarray(self.transforms(img)) | ||
else: | ||
return img |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use
six
!six
is needed for backward compatibility with python 2.x. Our minimum supported version of python is 3.7!