Skip to content

Add LOBSTER workflow schema #60

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 32 commits into
base: master
Choose a base branch
from

Conversation

naik-aakash
Copy link
Contributor

@naik-aakash naik-aakash commented Jun 5, 2025

Changes

  1. Add a LOBSTERworkflow class for representing lobster workflows
  2. Adapt workflow parsing from generic workflow to LOBSTERworkflow
  3. Add missing crucial information of basis functions used for projections in parsed quantities

Closes #59

@naik-aakash naik-aakash marked this pull request as draft June 5, 2025 05:03
@naik-aakash naik-aakash marked this pull request as ready for review June 8, 2025 15:35
@naik-aakash naik-aakash changed the title add LOBSTER workflow schema [WIP] add LOBSTER workflow schema Jun 8, 2025
@naik-aakash
Copy link
Contributor Author

naik-aakash commented Jun 8, 2025

Dummy workflow.archive.yaml file to generate and connect the underlying DFT and LOBSTER runs

workflow2:
  m_def: workflowparsers.lobster.workflow.LOBSTERWorkflow
  inputs:
    - name: Structure
      section: '../upload/archive/mainfile/BaTiO3_0/vasprun.xml.gz#/run/0/system/-1'
  outputs:
    - name: LOBSTER 1
      section: '../upload/archive/mainfile/BaTiO3_0/lobsterout.gz#/run/0/calculation/-1'
    - name: LOBSTER 2
      section: '../upload/archive/mainfile/BaTiO3_1/lobsterout.gz#/run/0/calculation/-1'
  tasks:
    - m_def: nomad.datamodel.metainfo.workflow.TaskReference
      task: '../upload/archive/mainfile/BaTiO3_0/vasprun.xml.gz#/workflow2'
      name: DFT run
      inputs:
        - name: Input structure
          section: '../upload/archive/mainfile/BaTiO3_0/vasprun.xml.gz#/run/0/system/-1'
      outputs:
        - name: Output DFT calculation
          section: '../upload/archive/mainfile/BaTiO3_0/vasprun.xml.gz#/run/0/calculation/-1'
    - m_def: nomad.datamodel.metainfo.workflow.TaskReference
      task: '../upload/archive/mainfile/BaTiO3_0/lobsterout.gz#/workflow2'
      name: LOBSTER run 1
      inputs:
        - name: Structure and Planewavefunction
          section: '../upload/archive/mainfile/BaTiO3_0/vasprun.xml.gz#/run/0/calculation/-1'
      outputs:
        - name: Bonding analysis data
          section: '../upload/archive/mainfile/BaTiO3_0/lobsterout.gz#/run/0/calculation/-1'
    - m_def: nomad.datamodel.metainfo.workflow.TaskReference
      task: '../upload/archive/mainfile/BaTiO3_1/lobsterout.gz#/workflow2'
      name: LOBSTER run 2
      inputs:
        - name: Structure and Planewavefunction
          section: '../upload/archive/mainfile/BaTiO3_0/vasprun.xml.gz#/run/0/calculation/-1'
      outputs:
        - name: Bonding analysis data
          section: '../upload/archive/mainfile/BaTiO3_1/lobsterout.gz#/run/0/calculation/-1'

Example Workflow depiction

image

@naik-aakash
Copy link
Contributor Author

naik-aakash commented Jun 8, 2025

Hi @ondracka , @ladinesa, @ndaelman-hu , @JFRudzinski it would be great if anyone could provide some input/feedback on it to improve this further.

Since LOBSER and VASP runs are not linked by default, I have written this custom workflow schema to connect the underlying DFT and LOBSTER runs. At this point, it works fine and connects all the entries as expected. The issue I am currently facing is that the custom workflow yaml (workflow.archive.yaml) parsing seems to run before the referencing entries get parsed, leading to failures, especially filling in the charge_spilling quantity in the workflow schema. I tested it separately, uploading workflow.archive.yaml file after the VASP and LOBSTER entries were successfully parsed. Then it does not have any errors. Any tips on how to delay the execution of this workflow schema until all required entries are parsed successfully?

Additionally I always have this warning showing up in the logs of the workflow.archive.yaml . I found that SimulationWorkflow class expects methods and results section and as I do not set this explicitly here, it prints this warning. The question would be is it fine to ignore this ? I did not find anything in documentation mentioning to define it here

image

@JFRudzinski
Copy link
Collaborator

Hi @ondracka , @ladinesa, @ndaelman-hu , @JFRudzinski it would be great if anyone could provide some input/feedback on it to improve this further.

Since LOBSER and VASP runs are not linked by default, I have written this custom workflow schema to connect the underlying DFT and LOBSTER runs. At this point, it works fine and connects all the entries as expected. The issue I am currently facing is that the custom workflow yaml (workflow.archive.yaml) parsing seems to run before the referencing entries get parsed, leading to failures, especially filling in the charge_spilling quantity in the workflow schema. I tested it separately, uploading workflow.archive.yaml file after the VASP and LOBSTER entries were successfully parsed. Then it does not have any errors. Any tips on how to delay the execution of this workflow schema until all required entries are parsed successfully?

Additionally I always have this warning showing up in the logs of the workflow.archive.yaml . I found that SimulationWorkflow class expects methods and results section and as I do not set this explicitly here, it prints this warning. The question would be is it fine to ignore this ? I did not find anything in documentation mentioning to define it here

image

Hi @naik-aakash Indeed, this is a known problem. The only solution at the moment is to upload the workflow yaml separately and subsequently to the DFT calculation/s as you mentioned. Alternatively, the identification and linking of the DFT calculation can be implemented into the parser, which then enables one to specify the order of parsing. This then takes a bit more effort.

We could discuss the latter approach with you, but actually we are going to have a meeting about yaml workflows today in which I will bring up this issue to discuss other potential solutions. I will let you know the outcome in the coming days.

@naik-aakash
Copy link
Contributor Author

Hi @ondracka , @ladinesa, @ndaelman-hu , @JFRudzinski it would be great if anyone could provide some input/feedback on it to improve this further.
Since LOBSER and VASP runs are not linked by default, I have written this custom workflow schema to connect the underlying DFT and LOBSTER runs. At this point, it works fine and connects all the entries as expected. The issue I am currently facing is that the custom workflow yaml (workflow.archive.yaml) parsing seems to run before the referencing entries get parsed, leading to failures, especially filling in the charge_spilling quantity in the workflow schema. I tested it separately, uploading workflow.archive.yaml file after the VASP and LOBSTER entries were successfully parsed. Then it does not have any errors. Any tips on how to delay the execution of this workflow schema until all required entries are parsed successfully?
Additionally I always have this warning showing up in the logs of the workflow.archive.yaml . I found that SimulationWorkflow class expects methods and results section and as I do not set this explicitly here, it prints this warning. The question would be is it fine to ignore this ? I did not find anything in documentation mentioning to define it here
image

Hi @naik-aakash Indeed, this is a known problem. The only solution at the moment is to upload the workflow yaml separately and subsequently to the DFT calculation/s as you mentioned. Alternatively, the identification and linking of the DFT calculation can be implemented into the parser, which then enables one to specify the order of parsing. This then takes a bit more effort.

We could discuss the latter approach with you, but actually we are going to have a meeting about yaml workflows today in which I will bring up this issue to discuss other potential solutions. I will let you know the outcome in the coming days.

Thanks @JFRudzinski , looking forward to solutions 😄

@naik-aakash
Copy link
Contributor Author

naik-aakash commented Jun 20, 2025

Hi @JFRudzinski, @ndaelman-hu, @ladinesa, @ondracka, do you have any suggestions for a workaround, or can I just proceed with this PR and uploading the yaml file seperately ?

@JFRudzinski JFRudzinski requested a review from ladinesa June 20, 2025 09:01
@JFRudzinski
Copy link
Collaborator

Hi @naik-aakash sorry for the delay in updating you. For now, you should move forward with the current approach. We will address the current issues hopefully in the coming, but it should not hold you back.

I have added @ladinesa to review this PR. Please wait until he approves before merging.

@naik-aakash naik-aakash marked this pull request as draft June 20, 2025 09:25
@naik-aakash
Copy link
Contributor Author

naik-aakash commented Jun 20, 2025

Hi @naik-aakash sorry for the delay in updating you. For now, you should move forward with the current approach. We will address the current issues hopefully in the coming, but it should not hold you back.

I have added @ladinesa to review this PR. Please wait until he approves before merging.

Thanks @JFRudzinski and no worries. I will try to wrap this up in next days

@naik-aakash
Copy link
Contributor Author

I tested it locally and it does work. do you have the vasprun.xnl or OUTCAR file in the same upload?

Yes, in same upload

Directory structure is like this. Does it matter if files inside directory are compressed ? I have all the output files in the directory with .gz compression.
mp-xx/vaspoutouts+lobsteroutputs

@ladinesa
Copy link
Collaborator

I tested it locally and it does work. do you have the vasprun.xnl or OUTCAR file in the same upload?

Yes, in same upload

Directory structure is like this. Does it matter if files inside directory are compressed ? I have all the output files in the directory with .gz compression. mp-xx/vaspoutouts+lobsteroutputs

ah yes I forgot compression I will make the necessary change.

@naik-aakash
Copy link
Contributor Author

I tested it locally and it does work. do you have the vasprun.xnl or OUTCAR file in the same upload?

Yes, in same upload
Directory structure is like this. Does it matter if files inside directory are compressed ? I have all the output files in the directory with .gz compression. mp-xx/vaspoutouts+lobsteroutputs

ah yes I forgot compression I will make the necessary change.

Ah I see, okay. I also got an error of _child_archives attribute not available for LobsterParser which I did simply set in init locally.

@ladinesa
Copy link
Collaborator

can you try if it works.

@naik-aakash
Copy link
Contributor Author

can you try if it works.

can you try if it works.

Tried, it creates an entry of Simulation workflow but DFT calcs are not referenced. Seems it does not read vasprun.xml anymore

@ladinesa
Copy link
Collaborator

do you have electronicparsers installed locally, vasp is parsed separately and it is simply referenced. I may have to set parser level to higher value for lobster so it parses after vasp.

@naik-aakash
Copy link
Contributor Author

do you have electronicparsers installed locally, vasp is parsed separately and it is simply referenced. I may have to set parser level to higher value for lobster so it parses after vasp.

Yes, VASP gets parsed successfully. Just reference is not generated in the SimulationWorkflow entry now. It seemed to generate the reference correctly in this commit when I uploaded my data with uncompressed vasprun.xml fcc5c66

@naik-aakash
Copy link
Contributor Author

Hi @ladinesa , it seems to work. I uploaded something wrong. Will takeup from this implementation and then finalize this PR

@ladinesa
Copy link
Collaborator

ladinesa commented Jun 26, 2025

I will simply merge #61 and you can rebase and work from there if you need to extend it further

@naik-aakash
Copy link
Contributor Author

I will simply merge #61 and you can rebase and work from there if you need to extend it further

thanks a lot 😄

@naik-aakash
Copy link
Contributor Author

naik-aakash commented Jun 26, 2025

Hi @ladinesa , I adapted your generic implementation to now work with LOBSTERWorkflow class and the output is now automatically generated like this. Thank you again for your help.

image

If you are fine with current implementation / there are no further comments on it, it would be great if after merging this PR new release is made available so I could test a bit more on develop server of NOMAD before starting to upload all of my data.

@naik-aakash naik-aakash marked this pull request as ready for review June 26, 2025 17:35
@naik-aakash naik-aakash marked this pull request as draft June 26, 2025 18:02
@naik-aakash naik-aakash marked this pull request as ready for review June 26, 2025 20:34
@naik-aakash naik-aakash marked this pull request as draft June 27, 2025 07:37
@naik-aakash naik-aakash requested a review from ladinesa June 27, 2025 08:47
@naik-aakash naik-aakash marked this pull request as ready for review June 27, 2025 08:47
@ladinesa
Copy link
Collaborator

Hi sorry will not be able to look at this today.

@naik-aakash naik-aakash marked this pull request as draft July 1, 2025 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding workflow.yaml to Lobster parser
4 participants