【PPMat Dataset No.2】Jarvis-DFT3D数据集适配 #203
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR 描述
在测试
Jarvis-DFT3D系列数据集时,其余config文件均可正常跑通,仅有megnet_jarvis_dft_3d_2021_bulk_modulus.yaml在运行过程中会触发如下错误:经排查,该问题由数据集中存在非法字符串(如
'na')引起,而当前JarvisDataset对非法或缺失数据的处理逻辑不够完善。本次 PR 对数据过滤与运行时验证进行了增强,确保无效数据能够在初始化阶段被正确剔除,并在训练过程中提供更清晰的错误提示。
1. 属性过滤逻辑增强(
filter_unvalid_by_property)修改前:
'na'、'none'等)未被正确识别,导致错误保留。修改后:
'na'、'nan'、'none'、''等)。np.nan并在过滤阶段剔除。代码修改如下:
该修改确保在数据加载阶段即可清除非法字符串或缺失值,提高了过滤的鲁棒性与一致性。
2. 新增运行时样本校验(
__getitem__)修改前:
修改后:
__getitem__中增加运行时校验逻辑,对'na'字符串与NaN值进行检测。代码修改如下:
该机制能在数据访问阶段及时捕获潜在异常,提供清晰可追溯的错误信息,显著提升调试效率。
验证结果
megnet_jarvis_dft_3d_2021_bulk_modulus.yaml@leeleolay @luotao1