Skip to content

Comments

Set shared library directory#535

Merged
rs028 merged 27 commits intoAtChem:masterfrom
rs028:shared_dir
Oct 22, 2025
Merged

Set shared library directory#535
rs028 merged 27 commits intoAtChem:masterfrom
rs028:shared_dir

Conversation

@rs028
Copy link
Collaborator

@rs028 rs028 commented Jul 10, 2025

Solves a couple of issues with the shared library.

One is described in #414: the build script uses the same argument as destination for the fortran mechanism files and for the shared library, even though (theoretically) the shared library can be elsewhere (as set by the $SHAREDLIBDIR variable in the Makefile). This makes the --shared_lib flag for the executable redundant. I think it makes sense for the shared library to be always in the same folder as the other mechanism files, especially now that we have a customRateFuncs.f90 in the same directory. With this PR we force mechanism.so to always be in the configuration directory and remove the associated flag from the executable. The underlying structure remains in place, so we can always reintroduce the flag in the future, if we change our minds.

The second issue is related to the conversion of the .fac file. The build script always runs the python converter, which means that the fortran mechanism files are always updated, even if they are not changed. This defies the purpose of having a shared library, which should only be recompiled if the .fac file is changed. To correct this, the execution of the python script is moved from the build script to the Makefile. The build script is simplified but the user interface does not change.

Also introduces some error handling and more comments for both the Makefile and the build script.

@codecov
Copy link

codecov bot commented Jul 12, 2025

Codecov Report

❌ Patch coverage is 11.53846% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.75%. Comparing base (c4ed6af) to head (d5ae918).
⚠️ Report is 30 commits behind head on master.

Files with missing lines Patch % Lines
src/argparse.f90 20.00% 12 Missing ⚠️
src/inputFunctions.f90 0.00% 11 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #535      +/-   ##
==========================================
- Coverage   52.05%   51.75%   -0.30%     
==========================================
  Files          17       17              
  Lines        2096     2104       +8     
  Branches      166      167       +1     
==========================================
- Hits         1091     1089       -2     
- Misses        933      943      +10     
  Partials       72       72              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rs028
Copy link
Collaborator Author

rs028 commented Oct 7, 2025

The auto-generated mechanism files (mechanism.* fortran files, customRateFuncs.o, and the shared library mechanism.so) are now created by default in the model/include directory. The content of model/include should be automatically rewritten every time the .fac file or customRateFuncs.f90 are modified (unless I messed up my Makefile instructions).

The model/configuration directory only contains the *.config and *.parameters files, plus customRateFuncs.f90.

The .fac file can be wherever the user wants: by default, it is in model/ although it may make more sense for it to be in model/configuration. Would be good to get some user's opinions on this.

@rs028
Copy link
Collaborator Author

rs028 commented Oct 7, 2025

@spco @AlfredMayhew please have a look at this. It should resolve PR AtChem/AtChem-tools#4, following Sam's suggestion.

The documentation will have to be changed, and I don't have time right now, but I can merge this PR and change the manual later, so Alfie can proceed with the modifications to the Atchem-tools in the meantime.

@AlfredMayhew
Copy link
Collaborator

Thanks @rs028. I've tried to have a go at using this and although the model is running, I'm still seeing the mechanism files in the configuration directory. I'm sure I'm doing something wrong, so let me know and I'll try again with the testing. Here is what I did:

First, I cloned your git repo and checked out the shared-dir branch:

git clone -b shared_dir https://github.com/rs028/AtChem2.git RS_AtChem2/
cd RS_AtChem2/

Then, I copied the Makefile template and adjusted the paths for CVODELIBDIR, OPENLIBMDIR, FRUITDIR.

cp tools/install/Makefile.skel Makefile

Then I compiled and ran the default model:

./build/build_atchem2.sh model/mechanism.fac model/configuration/
./atchem2 --model=model

Looking at the contents of the configuration directory then shows files such as mechanism.f90, mechanism.o, etc. mechanism.f90 is created in the model/include/ subdirectory, but the file only includes the first line (! Note that this file is automatically generated by build/mech_converter.py -- Any manual edits to this file will be overwritten when calling build/mech_converter.py). This is the only file present in include (except for .gitkeep).

Also, one additional issue I found was that the default behaviour of the build script (without specifying a configuration directory), e.g. passing ./build/build_atchem2.sh model/mechanism.fac, interprets the include directory as the config directory, and results in an error.

Am I doing something wrong?

@rs028
Copy link
Collaborator Author

rs028 commented Oct 9, 2025

Thanks for that. Teaches me to not submit changes in a hurry :)

@rs028
Copy link
Collaborator Author

rs028 commented Oct 9, 2025

so there is a little issue here, right now the build script takes 3 arguments:

  • mechanism.fac
  • model/configuration
  • mcm

if we add model/include then we have two options. either we add fourth argument (which is a breaking change), or we put include inside model/configuration. Personally, I don't mind a breaking change but perhaps is best done in conjunction with #416 , as we already changed the user interface of the build process 3 times since the first release.

Any thoughts? If @bsn502 and @willdrysdale also want to pitch in, it would be good to have a wider view of the user's preferences.

@AlfredMayhew
Copy link
Collaborator

I don't mind changing the arguments as it's easy enough to adjust my python tools, and I run 99% of my models through python anyway, but I understand that it's perhaps less convenient for others.
I would say that having include within configuration goes against what we have said previously about tucking the files not edited by the user out of view in their own directory so that configuration only contains files that the user should edit, but it's also similar to the way it's done currently and sounds like it might be easier for users to adapt to (because they don't have to change anything)... I don't really have a strong preference!

@rs028
Copy link
Collaborator Author

rs028 commented Oct 14, 2025

I agree. The original logic was that a user should be able to point at a specific configuration, a specific set of constraints, and a specific mechanism, so one could run models with the same configuration and different mechanisms, or viceversa. The problem is that it was implemented somewhat inconsistently, as we are finding out. The purpose of this PR is resolve some of this issues, but I see this as an intemediate step, hence I am reluctant to change the user interface without a complete solution. I think if we put include inside the configuration directory is the least disruptive change, and solves both the issue you experienced with the auto-generated files, and the issues I outlined above.
I will open another issue to discuss a proper solution.

@willdrysdale
Copy link
Collaborator

I've tried to catch myself up on this issue, but apologies if I have missed something as this is deeper than I have previously looked at the model.

If I understand correctly:

  • This change is motivated by separate issue in Atchem-tools.
  • When running many multiple models sometimes there is a conflict in some shared files (obj) that get deleted and restored during the build process.
  • The solution was to add a lock file, but this had its own issues if the process responsible for removing the lock file died before it was able to.
  • New suggestion is changing the shared_lib in Atchem2 to prevent the issue with obj before it starts. The implementation would probably require a breaking change to Atchem2 as an additional argument would be added to build_atchem2.sh

@bsn502 can definitely give a better option from a user perspective, but I have a couple of things docker related:

  • All we would need to do to support new arguments to build_atchem.sh is update the entrypoint.sh script to support these. Right now it only supports passing the mechanism (i.e argument 1), and assumes defaults for the configuration and mcm paths (arguments 2 and 3).
    • I should definitely update it to support args 2 and 3 anyway, that was some oversight from me.
    • This will give me a good reason to get my head back in and I can add the required github workflows for automatically creating the containers
  • @AlfredMayhew more generally - have you considered the containerised version for your mass model runs? From what I understand about this issue, it would negate the problem as each model run is entirely isolated. Its creation was motivated by running the model on HPCs, so there was no need to configure dependencies and the ephemeral/isolated nature prevents conflicts.
    • As an aside from what I can gather looking through AtChemTools it would not be too difficult to create a version of build_and_run that ran docker / apptainer commands, and it would be interesting to know if it could then slot into the rest of your tools!

@rs028
Copy link
Collaborator Author

rs028 commented Oct 14, 2025

Hi Will, sort of. Part of the this PR is fixing a bug where the model was always recompiling the shared library, even if the mechanism file had not changed. It also hides one of the executable flags (--shared_lib) to default values (may actually revert this last point, depending on the discussion in #539). The rest of the changes are related to the issue with AtChem-tools you describe.

@rs028 rs028 requested a review from AlfredMayhew October 14, 2025 17:26
@rs028
Copy link
Collaborator Author

rs028 commented Oct 21, 2025

@AlfredMayhew when you have time to review and approve this, I can merge and we can continue the discussion on #539.

@AlfredMayhew
Copy link
Collaborator

@rs028 Sure, I should have some time this afternoon!

@willdrysdale, I think Sam has mentioned the containerised version before (or maybe it was you another time?), so it's definitely something I should look into. I'm currently not really familiar with how any of that works (same with docker / apptainer), so it would be something I'd have to do some work to implement. I guess for now we can run with this fix and we can switch across later if it's worthwhile (either when I find time or if someone else wants to make those changes to the tools directory themselves).

@AlfredMayhew
Copy link
Collaborator

@rs028 Ok, I've had a quick test and it seems to be behaving as expected now (creating files in model/configuration/include). A couple of issues that I've run into. I think that both of these issues could be addressed as separate issues if you wanted to get these changes merged, since the main functionality is working.

First, I'm running into some issues using customRateFuncs.f90. The file doesn't seem to have access to the same fortran modules as before. Previously, I was using the types_mod defined in src/dataStructures.f90 in my functions to assign variable types. I'm not really sure how Fortran knows what modules are accessible by other modules, but before a function like the following would work, whereas now I get an error stating Fatal Error: Cannot open module file ‘types_mod.mod’ for reading at (1): No such file or directory:

module custom_functions_mod
  implicit none

contains

  pure function TestFunc( ) result ( k )
    use types_mod

    real(kind=DP) :: k              
    k = 1.0_dp
    
    return
  end function TestFunc

 
end module custom_functions_mod

This is not an issue for the test case I wrote in tests/model_tests/spec_model_func/configuration/customRateFuncs.f90 since that doesn't rely on types_mod. I think I used types_mod in my other functions because the GEOS-Chem functions have a similar structure where variables are defined as real(kind=DP) etc., so it was easier for me to directly copy the functions if I used types_mod.

Secondly, annoyingly, this still doesn't seem to resolve my issues when running parallel models! Again, this may be best for a separate discussion, or to just switch across to the containerised version, but I'll briefly describe the issue here. Again, this seems to relate to types_mod, but I assume that's just a coincidence?
When I run multiple parrallel models, some succeed, but most fail. During the build step, make fails with the following error:

f951: Fatal Error: Can't rename module file ‘obj/types_mod.mod0’ to ‘obj/types_mod.mod’: No such file or directory
compilation terminated.
make: *** [Makefile:143: model_2025-10-21_10-03-05.128012/configuration//include/mechanism.so] Error 1

Then, when the model is run, there is a memory allocation error: malloc(): corrupted top size. I've attached a log file with the full output.

Thanks!

Accr_NOx-VOC_47.log

@rs028
Copy link
Collaborator Author

rs028 commented Oct 21, 2025

It may be my mistake. In the Makefile we have:

$(SHAREDLIB): $(MECH_GEN)
	$(FORT_COMP) -c $(SRC)/dataStructures.f90 $(FSHAREDFLAGS) -o $(MECHDIR)/dataStructures.o -J$(OBJ) -I$(OBJ)
	$(FORT_COMP) -c $(CONFIGDIR)/customRateFuncs.f90 $(FSHAREDFLAGS) -o $(MECHDIR)/customRateFuncs.o -J$(MECHDIR) -I$(MECHDIR)
	$(FORT_COMP) -c $(MECHDIR)/mechanism.f90 $(FSHAREDFLAGS) -o $(MECHDIR)/mechanism.o -J$(OBJ) -I$(OBJ)
	$(FORT_COMP) -shared -o $(SHAREDLIB) $(MECHDIR)/dataStructures.o $(MECHDIR)/customRateFuncs.o $(MECHDIR)/mechanism.o

Try changing -J$(OBJ) -I$(OBJ) to -J$(MECHDIR) -I$(MECHDIR) and see if it solves the issue. These tell the compiler the destination of the modules and where to find them.

@AlfredMayhew
Copy link
Collaborator

Nice, that change seems to have solved both problems!
Just to check, the phrase -J$(OBJ) -I$(OBJ) also appears further up in the Makefile (line 109), I didn't change that line, just the two lines in the section you quoted above. I'm guessing that's fine since everything worked.

@rs028
Copy link
Collaborator Author

rs028 commented Oct 22, 2025

Great, thanks. These lines refer to the compilation of the shared library (hence MECHDIR), while the previous -J$(OBJ) -I$(OBJ) line refers to the compilation of the executable, so it is correct that the compiler writes/reads from the obj/ directory.

@rs028 rs028 merged commit 0a1c294 into AtChem:master Oct 22, 2025
5 of 7 checks passed
@rs028 rs028 mentioned this pull request Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants