Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用PaddleRS中的变化检测模型报错 #204

Closed
Programrookie33 opened this issue Jul 3, 2024 · 10 comments
Closed

使用PaddleRS中的变化检测模型报错 #204

Programrookie33 opened this issue Jul 3, 2024 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@Programrookie33
Copy link

Thanks for your bug report. To help us better solve the issue, please provide the following information:

  1. PaddleRS version: (please specify the branch as well,e.g. PaddleRS release/1.0)
  2. PaddlePaddle version: (e.g. PaddlePaddle 2.3.0)
  3. Operation system: (e.g. Linux/Windows/MacOS)
  4. Python version: (e.g. Python3.7/8)
  5. CUDA/cuDNN version: (e.g. CUDA10.2/cuDNN 7.6.5)
  6. Full codes: (if you modify any original code,please show the comparison of the codes before and after)
  7. Detailed error information and releated running log: (if you used multi-gpus,the log can be found in log/worklog.0 by default)
  8. Steps to reproduce the problem:
  9. Additional context: (add any other context about the problem)

欢迎您反馈PaddleRS使用问题。辛苦您提供以下信息,以方便我们快速定位和解决问题:

  1. PaddleRS版本:不是很确定具体版本,别人项目里缓存的版本,日期应该是1.0正式版以后的版本
  2. PaddlePaddle版本:2.4.0
  3. 操作系统信息:Linux
  4. Python版本号:Python3.7.4
  5. CUDA/cuDNN版本:( 如CUDA10.2/cuDNN 7.6.5等)
  6. 完整的代码:

导入PaddleRS库

import paddlers as pdrs
import paddlers.transforms as T
import os.path as osp
import paddle

train_transforms = T.Compose([
# 读取数据
T.DecodeImg(),
# 以50%的概率实施随机水平翻转
#T.RandomHorizontalFlip(prob=0.2),
# 以50%的概率实施随机垂直翻转
#T.RandomVerticalFlip(prob=0.3),
# 数据归一化到[-1,1]
T.Normalize(
mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5]
),
T.RandomFlipOrRotate(
probs = [0.1, 0.1], # p=0.3 to flip the image,p=0.2 to rotate the image,p=0.5 to keep the image unchanged.
probsf = [0.1, 0.1, 0, 0, 0], # p=0.3 and p=0.25 to perform horizontal and vertical flipping; probility of no-flipping is 0.45.
probsr = [0, 0.2, 0]), # p=0.65 to rotate the image by 180°; probility of no-rotation is 0.35.
T.ReloadMask(),
# 挑选训练过程中需要用到的数据,并按照指定顺序排列
T.ArrangeChangeDetector('train')
])
eval_transforms = T.Compose([
T.DecodeImg(),
# 在验证阶段,输入原始尺寸影像,对输入影像仅进行归一化处理
# 验证阶段与训练阶段的数据归一化方式必须相同
T.Normalize(
mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5]
),
T.ReloadMask(),
# 挑选验证过程中需要用到的数据,并按照指定顺序排列
T.ArrangeChangeDetector('eval')

])

处理后数据集所在目录

DATA_DIR = "/home/aistudio/work/Landslide_CD/"

train_dataset = pdrs.datasets.CDDataset(
data_dir=DATA_DIR,
file_list=osp.join(DATA_DIR, 'train.txt'),
transforms=train_transforms,
label_list=None,
# 使用4个辅助进程加载数据
num_workers=4,
shuffle=True,
# 将取值为{0,255}的标签自动归一化到{0,1}
binarize_labels=True
)
val_dataset = pdrs.datasets.CDDataset(
data_dir=DATA_DIR,
file_list=osp.join(DATA_DIR, 'val.txt'),
transforms=eval_transforms,
label_list=None,
num_workers=0,
shuffle=False,
binarize_labels=True
)
#模型
model = pdrs.tasks.cd.FCCDN()
#学习率
lr_scheduler = paddle.optimizer.lr.StepDecay(
0.001,
step_size=5000,
# 学习率衰减系数,这里指定每次减半
gamma=0.5
)
#优化器
optimizer = paddle.optimizer.Lamb(
learning_rate=lr_scheduler,
parameters=model.net.parameters()
)

执行模型训练

model.train(
num_epochs=50,
train_dataset=train_dataset,
train_batch_size=20,
eval_dataset=val_dataset,
optimizer=optimizer,
save_interval_epochs=1,
# 每多少次迭代记录一次日志
log_interval_steps=10,
save_dir='/home/aistudio/work/output/fccdn',
#pretrain_weights='/home/aistudio/work/output/cdnet/best_model/model.pdparams',
# 是否使用early stopping策略,当精度不再改善时提前终止训练
early_stop=True,
# 是否启用VisualDL日志功能
use_vdl=True,
# 指定从某个检查点继续训练
resume_checkpoint=None
)

  1. 详细的错误信息与相关log:

ValueError Traceback (most recent call last)
/tmp/ipykernel_64988/2465525996.py in
93 use_vdl=True,
94 # 指定从某个检查点继续训练
---> 95 resume_checkpoint=None
96 )

~/work/PaddleRS/paddlers/tasks/change_detector.py in train(self, num_epochs, train_dataset, train_batch_size, eval_dataset, optimizer, save_interval_epochs, log_interval_steps, save_dir, pretrain_weights, learning_rate, lr_decay_power, early_stop, early_stop_patience, use_vdl, resume_checkpoint)
333 early_stop=early_stop,
334 early_stop_patience=early_stop_patience,
--> 335 use_vdl=use_vdl)
336
337 def quant_aware_train(self,

~/work/PaddleRS/paddlers/tasks/base.py in train_loop(self, num_epochs, train_dataset, train_batch_size, eval_dataset, save_interval_epochs, log_interval_steps, save_dir, ema, early_stop, early_stop_patience, use_vdl)
372 outputs = self.train_step(step, data, ddp_net)
373 else:
--> 374 outputs = self.train_step(step, data, self.net)
375
376 scheduler_step(self.optimizer, outputs['loss'])

~/work/PaddleRS/paddlers/tasks/base.py in train_step(self, step, data, net)
663
664 def train_step(self, step, data, net):
--> 665 outputs = self.run(net, data, mode='train')
666
667 loss = outputs['loss']

~/work/PaddleRS/paddlers/tasks/change_detector.py in run(self, net, inputs, mode)
113
114 def run(self, net, inputs, mode):
--> 115 net_out = net(inputs[0], inputs[1])
116 logit = net_out[0]
117 outputs = OrderedDict()

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in call(self, *inputs, **kwargs)
928 return self.forward(*inputs, **kwargs)
929 else:
--> 930 return self._dygraph_call_func(*inputs, **kwargs)
931
932 def forward(self, *inputs, **kwargs):

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in _dygraph_call_func(self, *inputs, **kwargs)
913 outputs = self.forward(*inputs, **kwargs)
914 else:
--> 915 outputs = self.forward(*inputs, **kwargs)
916
917 for forward_post_hook in self._forward_post_hooks.values():

~/work/PaddleRS/paddlers/rs_models/cd/fccdn.py in forward(self, t1, t2)
453 y2 = self.block4(e3_2)
454
--> 455 y1 = self.center(y1)
456 y2 = self.center(y2)
457 c = self.df4(y1, y2)

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in call(self, *inputs, **kwargs)
928 return self.forward(*inputs, **kwargs)
929 else:
--> 930 return self._dygraph_call_func(*inputs, **kwargs)
931
932 def forward(self, *inputs, **kwargs):

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in _dygraph_call_func(self, *inputs, **kwargs)
913 outputs = self.forward(*inputs, **kwargs)
914 else:
--> 915 outputs = self.forward(*inputs, **kwargs)
916
917 for forward_post_hook in self._forward_post_hooks.values():

~/work/PaddleRS/paddlers/rs_models/cd/fccdn.py in forward(self, x)
143 nl = self.nl3(d3)
144 d3 = self.upsample_x2(paddle.multiply(d3, nl)) ##2C,H/2,W/2
--> 145 d2 = self.conv_d2(e2 + d3) # C,H/2,W/2
146 nl = self.nl2(d2)
147 d2 = self.upsample_x2(paddle.multiply(d2, nl)) # C,H,W

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py in impl(self, other_var)
297 else:
298 math_op = getattr(_C_ops, op_type)
--> 299 return math_op(self, other_var, 'axis', axis)
300
301 comment = OpProtoHolder.instance().get_op_proto(op_type).comment

ValueError: (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [20, 128, 7, 7] and the shape of Y = [20, 128, 6, 6]. Received [7] in X is not equal to [6] in Y at i:2.
[Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at /paddle/paddle/phi/kernels/funcs/common_shape.h:84)
[operator < elementwise_add > error]
9. 问题复现步骤:使用官方提供的训练脚本
11. 其他内容: 使用changestar、FCCDN、DSIFN模型均出现类似问题,报错信息几乎一致

@Programrookie33 Programrookie33 added the bug Something isn't working label Jul 3, 2024
@github-actions github-actions bot added triage new issue/PR waiting to be dealed and removed triage new issue/PR waiting to be dealed labels Jul 3, 2024
@Programrookie33 Programrookie33 changed the title [Bug] 使用PaddleRS中的变化检测模型报错 Jul 3, 2024
@Bobholamovic
Copy link
Member

看起来是一个尺寸不匹配的问题,建议在训练阶段加入随机裁剪,在验证阶段加入resize,以保证输入图像具有固定的尺寸。受到U-Net型结构上下采样时的舍入误差影响,输入网络的图像长、宽最好是4或者8的整数倍(与具体的模型有关)。

@Programrookie33
Copy link
Author

是的,我已经在代码中加入了随机裁剪和resize,但是还是报了相同的错误
a

@Bobholamovic
Copy link
Member

请问现在的输入尺寸是多少呀?

@Programrookie33
Copy link
Author

224*224

@Bobholamovic
Copy link
Member

Bobholamovic commented Jul 4, 2024

请问是通过数据预处理算子实现的吗?还是预先裁剪好图片呀?如果是前者的话,请贴一下代码~

@Programrookie33
Copy link
Author

train_transforms = T.Compose([
# 读取数据
T.DecodeImg(),
# 以50%的概率实施随机水平翻转
#T.RandomHorizontalFlip(prob=0.2),
# 以50%的概率实施随机垂直翻转
#T.RandomVerticalFlip(prob=0.3),
# 数据归一化到[-1,1]
T.Normalize(
mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5]
),
T.RandomCrop(
# 裁剪区域将被缩放到224x224
crop_size=224,
# 裁剪区域的横纵比在0.5-2之间变动
aspect_ratio=[0.5, 2.0],
# 裁剪区域相对原始影像长宽比例在一定范围内变动,最小不低于原始长宽的1/5
scaling=[0.2, 1.0]),
T.RandomFlipOrRotate(
probs = [0.1, 0.1], # p=0.3 to flip the image,p=0.2 to rotate the image,p=0.5 to keep the image unchanged.
probsf = [0.1, 0.1, 0, 0, 0], # p=0.3 and p=0.25 to perform horizontal and vertical flipping; probility of no-flipping is 0.45.
probsr = [0, 0.2, 0]), # p=0.65 to rotate the image by 180°; probility of no-rotation is 0.35.
T.ReloadMask(),
# 挑选训练过程中需要用到的数据,并按照指定顺序排列
T.ArrangeChangeDetector('train')
])
eval_transforms = T.Compose([
T.DecodeImg(),
# 在验证阶段,输入原始尺寸影像,对输入影像仅进行归一化处理
# 验证阶段与训练阶段的数据归一化方式必须相同
T.Normalize(
mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5]
),
T.Resize(224),
T.ReloadMask(),
# 挑选验证过程中需要用到的数据,并按照指定顺序排列
T.ArrangeChangeDetector('eval')
])
图片预先裁剪好的,但是代码中也添加了随机裁剪

@Bobholamovic
Copy link
Member

有点儿奇怪……建议在~/work/PaddleRS/paddlers/rs_models/cd/fccdn.py的第135行的forward方法中打印xe1e2e3d3等张量的形状(例如print(x.shape)),我们看看是否符合预期~

@Programrookie33
Copy link
Author

fccdn模型是这样的
x.shape=[20, 64, 14, 14]
e1.shape=[20, 64, 14, 14]
e2.shape=[20, 128, 7, 7]
e3.shape=[20, 256, 3, 3]
d3.shape=[20, 128, 3, 3]

@Bobholamovic
Copy link
Member

我阅读了一下源码,FCCDN这个模型可能需要输入尺寸是64的整数倍(因为encoder下采样16倍,而NLFPN又会下采样4倍)。建议尝试将输入尺寸调整为192或256~

@Programrookie33
Copy link
Author

好的,可以运行了!非常感谢您的耐心回答!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants