Skip to content

Conversation

@Le-soleile
Copy link

@Le-soleile Le-soleile commented Oct 24, 2025

PR Category

Dataset adaptation

PR Types

Add adapters

Description

tmQM数据集是一个全面的过渡金属配合物量子化学数据库,包含多个互补的数据文件:.xyz文件提供了在GFN2-xTB级别优化的分子几何结构;.csv文件包含了SMILES分子表示以及在TPSSh/def2SVP级别计算的量子化学属性,包括电子能、色散能、偶极矩、金属电荷、HOMO/LUMO能隙和能量以及极化率;.q文件提供了在TPSSh/def2SVP水平计算的自然原子电荷分布;.BO文件则包含了在GFN2-xTB水平计算的Wiberg键级和原子价指数(其中极化率除外,是在GFN2-xTB水平单独计算的)。
网址: https://www.uiocompcat.info/tmqmdataset

现将.xyz文件的几何结构信息和.csv文件特征属性融合为一个数据集用于训练,同时出于模型适用性提供选择:”是否使用.q文件和.BO文件进行训练“,已在MEGNet模型上测试
示例bash: python property_prediction/train.py -c property_prediction/configs/megnet/megnet_tmqm_train_108k_electronic_e.yaml

image

@paddle-bot
Copy link

paddle-bot bot commented Oct 24, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Oct 24, 2025
Copy link
Collaborator

@leeleolay leeleolay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在megnet readme文件中添加这个数据集的训练结果

path_bond: "./data/tmqm/tmQM_X.BO"
electronic_e_key: "electronic_e"
property_names: ${Global.label_names}
use_atomic_charge: True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方控制是否读取q文件吗?建议写个注释

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

electronic_e_key: "electronic_e"
property_names: ${Global.label_names}
use_atomic_charge: True
use_chemical_bonding: True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

sample_structure = structures[0]
logger.info(f"Structure object type: {type(sample_structure)}")
logger.info(f"Structure object attributes: {dir(sample_structure)}")
# If you use supplementary data, incorporate it into the structure
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if len(structures) > 0:
sample_structure = structures[0]
logger.info(f"Structure object type: {type(sample_structure)}")
logger.info(f"Structure object attributes: {dir(sample_structure)}")
这块代码的作用是什么

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在生成结构数据后,获取第一个结构对象作为样本,并通过日志记录其类型和所有可用属性,用于调试和验证 BuildStructure 返回的对象是否符合预期。

@Le-soleile
Copy link
Author

在megnet readme文件中添加这个数据集的训练结果

并没有完整的训练(所需时间太长,在学校没有始终运行和监督的条件)。数据适配器可以正常运行,我会补充表格中有关数据集信息和配置文件的部分😊

<td nowrap="nowrap">megnet_tmqm_train_108k_dispersion_e</td>
<td nowrap="nowrap">tmQM_108k</td>
<td nowrap="nowrap">Dispersion Energy (Hartree)</td>
<td nowrap="nowrap"> 0.000 / 0.000</td>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‘0.000 / 0.000’这里修改为‘-’,

<td nowrap="nowrap">Electronic Energy (Hartree)</td>
<td nowrap="nowrap"> 0.000 / 0.000</td>
<td nowrap="nowrap">1</td>
<td nowrap="nowrap">~0 hours</td>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‘~0 hours’修改为‘-’

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants