Add comments to ff fitting example, + some function signatures/docstrings #2

lilyminium · 2024-08-12T05:24:20Z

Hi @trevorgokey, I was going through the 07_besmarts_fit example and making some notes for myself on what each line does / the Python object structures, as the tutorial tends to show a large portion of code at once and pull out some highlights to explain. Just to check my understanding, do you reckon the comments are accurate or have I completely misunderstood some of what's going on? I do have some questions remaining on the data structures that I'll add as comments.

In case you think the notes will be useful to anyone else, I've opened this PR just to suggest adding them in.

besmarts-core/python/examples/07_besmarts_fit.py

lilyminium · 2024-08-12T05:27:48Z

besmarts-core/python/examples/07_besmarts_fit.py


    # 2. Configure the objective tiers
    final = fits.objective_tier()
+    # set the objectives for the final tier
+    # each value is an ``objective_config``
    final.objectives = {
        0: fits.objective_config_position(


I'm not too sure what the keys final.objectives represent -- do they represent order, or do they maybe correspond with the assignments.POSITIONS index? If the latter, constructing the dict with {assignments.POSITIONS: fits.objective_config_position(... might be clearer.

The objective keys are completely arbitrary ints; they represent order. I did it more for the case of when the the objective list gets split up they maintain their "index"

I see, thank you! So is it more of an administrative/logging device to track each objective individually, or does it affect how the objective is computed? If it's just a naming thing, are integers the only keys accepted and does it expect them to be contiguous?

They have no almost no impact on how the objective is computed, other than the order they are added (FIFO; put expensive things first). I haven't put much thought into the objective keys, but I've been assuming ints. Looking through the code, I think the only part that uses the objective key is when I break up the particular computation into pieces, I keep track of ownership using the key. So, I would guess anything hashable is usable as a key. For example I have this in optimizers_scipy:

tasks[(n, (i, (0,)))] = task

where n is the iteration number, i is the objective key in question, and (0,) refers to no parameter deltas (e.g. the f(x0) in the numerical derivatives. This is mostly useful for Hessians, where a molecule can have 100s of deltas to compute, and I break them into pieces and distribute them for computation. I collect the results based on the objective key (i above). That said, they shouldn't need to be continuous, just something hashable to organize computations with.

lilyminium · 2024-08-12T06:06:17Z

besmarts-core/python/examples/07_besmarts_fit.py

+        # create a graph and add it to the graph_db
+        g: graphs.graph = gcd.smiles_decode(smi)
+        # molecule_graph_id is the graph ID of the graph in the graph_db
+        molecule_graph_id: int = assignments.graph_db_add_graph(gdb, smi, g)


Is it possible to use besmarts with multi-molecule data (e.g. dimers) at all?

Stuff like dimers is the reason I made the much more complicated graph_db data structure.. however right now nearly everything assumes a single molecule, so it would need some work/checking to get going correctly. Each molecule coordinates would be a graph_db_column and the graph_db_row could be considered a single snapshot/frame, where each of these would be associated to the graph_db_graph of the same graph ID.

Ahh I think I understand. So can a graph.graph contain multiple molecules? IIRC the graph_same check only checks encoded primitives so coordinates wouldn't differentiate the molecules. So I'm wondering if a dimer energy of identical molecules would need to reference the same graph ID twice, and how that would interact with graph_db_tables.gid.

(Just to check my understanding -- does the below sound right to you?)
Each graph_db_entry in the gdb contains graph_db_tables, which contains graph_db_graphs that are referenced by gid. Coordinates are stored separately by atom in graph_db_columns, which are collected in graph_db_rows. In the case of using an interaction energy between two different molecules, I think the value at gde.tables[assignments.ENERGIES].values[snapshot_i] would be the interaction energy at gde.tables[assignments.POSITIONS].graphs[gid].rows[snapshot_i]?

I've been treating all graphs as fully connected; I haven't gone into much thinking/testing whether disconnected graphs should be allowed as of yet. There are pros and cons to doing so, but right now I'm leaning towards having everything connected (and maybe adding empty bonds for specific problems; these would encode to . in the string representation). I believe everything in graphs is strictly 2D based, so yes the graph operations won't pick up on different conformations. The dimer would most certainly reference the same graph ID; the underlying graph is how selections are referenced in the graph_db_columns (if you have IC data; perhaps less relevant here). As an example you can query bonds of a graph, say (1, 2) by calling graphs.graph_bonds(gdb.graphs[gid]), which would be a selection in the graph_db_column, such as gde.tables[BONDS][gid][rid][cid][(1,2)] (if you put data there). There is no way to specify conformations to a graph object; think of them as part of a regular FF topology. The easiest way to do so would be a assignments.graph_assignment object, which are used in most places needing positions.

And yes, that is a quick way to make that work and probably the only sane way to do so, showing the usefulness of having a dumb values member :) I read that as the total energy for a particular snapshot. There may be two gids each with one column for snapshot_i, or the same gid with two columns with different sets of coordinates.

trevorgokey · 2024-08-12T16:25:27Z

Hi @lilyminium, I appreciate your effort here. I think the intention here is to leave this up for some time as a means to take notes, or are you done? In general I would be happy to merge your input as part of improving the example's clarity.

lilyminium · 2024-08-13T08:33:31Z

besmarts-core/python/examples/07_besmarts_fit.py

+                    energy_data = f.read().split("\n")
+                # assume energy data is a list of floats with one energy per line
+                energy_values = [*map(float, [x for x in energy_data if x])]
+                energy_gdt.values.extend(energy_values)


If this example contained multiple observations to fit to (so multiple gids I assume), how would besmarts map each energy value to the corresponding coordinates? I could be missing something here but I don't see any modification of energy_gdt.graphs (although I guess energies aren't actually part of this example!)

So I never really intended to use energies in this example, and this snippet is more destructive than constructive. The value member field was just a lazy way to append data without having to deal with the data structure as I worked on various objective functions (hessian freqs, DLC gradients, etc), so I consider the values member to be "user defined" :). To answer your question generally, the (proper/designed) way to do this would be to set up a graph_db table (with a null topology since it is per graph/molecule/system) and then you would assign energies as you would like. I note that a weakness here is that I don't have a great way to set a total energy anywhere, it would have to be a sum over all graph_db_graph objects for a particular row_id. This would give you a bunch of graph_db_columns, and for null topologies I reserve the selection (0,) to refer to the entire molecule, remembering that I use 1-index for selections for atoms so 0 is never used anywhere else. (I haven't actually done this anywhere, but this is how I will do it once I start fitting energies). This would be mostly for QM energies, but it should be possible to do IC energies instead for MM. One could set up a custom addressing to store the totals in the values member; this greatly simplifies computing the objective at the end, which is essentially why I added it in the first place.

lilyminium · 2024-08-13T08:40:57Z

Thanks Trevor, I still have a couple questions that might help with the notes but in general happy for you to use this however you like, and merge if you think it'll be helpful! I can let you know when I'm done updating.

trevorgokey · 2024-08-14T01:31:59Z

I've updated this example and refactored it to use stuff that is in core.

lilyminium added 5 commits August 8, 2024 16:57

start adding comments

b0d36e4

remove redundant additions to graph db?

8f4c678

remove redundant enumerate

47dad24

add types and comment gdb creation

1f9ebe4

add some more comments and signatures

9c11575

lilyminium commented Aug 12, 2024

View reviewed changes

besmarts-core/python/examples/07_besmarts_fit.py Outdated Show resolved Hide resolved

lilyminium commented Aug 12, 2024

View reviewed changes

lilyminium commented Aug 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add comments to ff fitting example, + some function signatures/docstrings #2

Add comments to ff fitting example, + some function signatures/docstrings #2

lilyminium commented Aug 12, 2024

lilyminium Aug 12, 2024 •

edited

Loading

trevorgokey Aug 12, 2024

lilyminium Aug 13, 2024 •

edited

Loading

trevorgokey Aug 14, 2024

lilyminium Aug 12, 2024

trevorgokey Aug 12, 2024

lilyminium Aug 13, 2024

trevorgokey Aug 14, 2024

trevorgokey commented Aug 12, 2024

lilyminium Aug 13, 2024

trevorgokey Aug 14, 2024 •

edited

Loading

lilyminium commented Aug 13, 2024

trevorgokey commented Aug 14, 2024

Add comments to ff fitting example, + some function signatures/docstrings #2

Are you sure you want to change the base?

Add comments to ff fitting example, + some function signatures/docstrings #2

Conversation

lilyminium commented Aug 12, 2024

lilyminium Aug 12, 2024 • edited Loading

Choose a reason for hiding this comment

trevorgokey Aug 12, 2024

Choose a reason for hiding this comment

lilyminium Aug 13, 2024 • edited Loading

Choose a reason for hiding this comment

trevorgokey Aug 14, 2024

Choose a reason for hiding this comment

lilyminium Aug 12, 2024

Choose a reason for hiding this comment

trevorgokey Aug 12, 2024

Choose a reason for hiding this comment

lilyminium Aug 13, 2024

Choose a reason for hiding this comment

trevorgokey Aug 14, 2024

Choose a reason for hiding this comment

trevorgokey commented Aug 12, 2024

lilyminium Aug 13, 2024

Choose a reason for hiding this comment

trevorgokey Aug 14, 2024 • edited Loading

Choose a reason for hiding this comment

lilyminium commented Aug 13, 2024

trevorgokey commented Aug 14, 2024

lilyminium Aug 12, 2024 •

edited

Loading

lilyminium Aug 13, 2024 •

edited

Loading

trevorgokey Aug 14, 2024 •

edited

Loading