Adding structured datasets for detailed performance characterization #104

colemathis · 2025-07-25T23:50:39Z

Distinct classes of molecules present different challenges for computing assembly indices. Here we add two new data sets, one composed entirely of fused rings, and one composed entirely of tree-like molecules. Both datasets contain only hydrocarbons (e.g. there are no heteroatoms), and they contain only single or double bonds.

The fused ring molecules are selected from an existing dataset: https://pubs.rsc.org/en/content/articlehtml/2024/cp/d4cp01027b

The tree-like molecules are derived from the properties of fused ring molecules. For each fused-ring molecule we construct a tree-like molecule with the same number of bonds and the same share of single vs double bonds.

This PR contains the scripts required to generate the data and the files themselves.

There are additional molecule classes that could be of interest once we have benchmarks on these data.

…mpling the data based on reasonable compute times, and data that we need, converting to mol files and saving in with meta data

…ng data

colemathis requested a review from jdaymude July 25, 2025 23:50

colemathis added 5 commits July 30, 2025 09:39

Adding script to pull data for condensed poly-aromatric rings, downsa…

0487621

…mpling the data based on reasonable compute times, and data that we need, converting to mol files and saving in with meta data

Sampling random branched alkenes with bond counts derived from the ri…

2dc0c7f

…ng data

Update readme for new sceripts

fc497cd

Add ring data

b8bc5c5

Add tree data

7676159

jdaymude force-pushed the structured_data branch from 2abdaa9 to 7676159 Compare July 30, 2025 16:40

jdaymude added the data Additions or modifications to reference datasets label Aug 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding structured datasets for detailed performance characterization #104

Adding structured datasets for detailed performance characterization #104

Uh oh!

colemathis commented Jul 25, 2025

Uh oh!

Uh oh!

Adding structured datasets for detailed performance characterization #104

Are you sure you want to change the base?

Adding structured datasets for detailed performance characterization #104

Uh oh!

Conversation

colemathis commented Jul 25, 2025

Uh oh!

Uh oh!