Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime error reading bp files #1221

Open
annamalai-ps opened this issue Mar 8, 2022 · 7 comments
Open

Runtime error reading bp files #1221

annamalai-ps opened this issue Mar 8, 2022 · 7 comments

Comments

@annamalai-ps
Copy link

annamalai-ps commented Mar 8, 2022

Hi! I have been using openpmd-api for a while and I did not have any issues reading .bp files with openPMD-api in Python. Recently I am facing an issue with reading bp files. When I execute:

series_4h = io.Series("/gpfs/aps/simulations/planewave_08/output-0000/planewave_08/planewave_08.it00000%T.bp"
                      ,io.Access.read_only) 

I get the following error:

RuntimeError: [setAttribute] Value for string attribute 'patchSuffixes' must not be empty!

I am using openpmd-api 0.14.4 and I have attached the bp file.

planewave_08.it00000000.zip

@franzpoeschel
Copy link
Contributor

What is happening here

You have written a dataset with a vector containing one empty string

> bpls planewave_08.it00000000.bp -al
…
  string    /data/0/patchSuffixes                                                       attr   = {""}
…

In reading, the openPMD-api does not distinguish between single-element vectors and scalar values in ADIOS2 attributes, so it sees this as a single empty string. Empty strings however are disallowed because they are not supported by our HDF5 backend and we want to keep things compatible with all backends.

Possible workaround

We could disable this kind of check while reading and restrict the check to writing.
However, I'm not sure if HDF5 actually supports empty strings as elements of vectors and we should additionally safeguard that case. Do you know more here @ax3l?

For now, #1223 implements this workaround and should make things work for you again. With this fix, I can read your dataset without issue.

Better fix

In the output above, bpls is somehow able to distinguish single-element vectors from scalar values. I don't see a way to distinguish both in ADIOS2, but apparently it's possible somehow. If we had that, we could use it to avoid the situation altogether.

@eschnett
Copy link
Contributor

eschnett commented Mar 9, 2022

@franzpoeschel See is_value to distinguish between scalars ("values") and arrays.

I am also surprised that HDF5 wouldn't support empty strings. I was able to create an HDF5 file with an empty string attribute just fine:

$ h5dump /tmp/emptystring.h5
HDF5 "/tmp/emptystring.h5" {
GROUP "/" {
   ATTRIBUTE "string" {
      DATATYPE  H5T_STRING {
         STRSIZE 1;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): ""
      }
   }
}
}

@franzpoeschel
Copy link
Contributor

@franzpoeschel See is_value to distinguish between scalars ("values") and arrays.

So the functionality definitely exists, but it seems like it is currently not exposed in the C++ bindings (your link is from the C bindings). I've already contacted Norbert about this, maybe he sees something that I don't.

I am also surprised that HDF5 wouldn't support empty strings. I was able to create an HDF5 file with an empty string attribute just fine:

$ h5dump /tmp/emptystring.h5
HDF5 "/tmp/emptystring.h5" {
GROUP "/" {
   ATTRIBUTE "string" {
      DATATYPE  H5T_STRING {
         STRSIZE 1;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): ""
      }
   }
}
}

To be fair, the HDF5 backend was written before I started working on openPMD, so Axel will know more about the details here probably

@ax3l
Copy link
Member

ax3l commented Mar 13, 2022

Thanks for investigating this and the first work-around in #1223 👍

HDF5

I think the reason we did not support this is because a couple of the involved C calls in both read and write need extra careful handling if used with empty strings, e.g., for vlen strings: #1084
That said, we officially in openPMD only support constant size strings anyway.

For instance, there are calls like H5Tset_size that behave oddly with zero strings #979 in some situations, and I am kinda don't like to track corner-cases of HDF5 releases over time to support features like storing an empty string (why at all, one might ask).

Just to close the loop for writing as well: Do you have a good use case why we should in write not error out for empty strings? This feels a bit like if we add support for it, then we might loose a lot of time debugging it for old releases of various backends & platforms, etc. triage it with upstream, add extra code paths for work-arounds, and I don't immediately see a use case yet. I am ok-ish to add it in read.

@eschnett
Copy link
Contributor

I checked, and neither Python nor Julia have any issues reading or writing empty string attributes in HDF5. It should be possible to continue to support writing empty strings, even for HDF5 files. The error reported above was for an ADIOS2 file and was thus unrelated to HDF5.

My case for empty string is: It is always good to support empty containers, and this is often a natural way of expressing certain conditions. Programming languages support empty strings. The file mentioned above contains an empty string because that was a natural choice in our writer. It is, of course, always possible to avoid empty strings, but this can be nontrivial.

The file above contains an array of strings with one array element per patch. Avoiding empty strings in an array either requires remapping array indices (that would add much complexity), or adding a sentinel value to each string, or to empty strings. The reader then needs to look for and handle these sentinel values. Paying respect to Fortran, one could replace empty strings with a string that contains only a single space , but it would be much easier if that was handled by a lower level and not the application.

@ax3l
Copy link
Member

ax3l commented Nov 11, 2022

I see, thanks for the detailed example. We usually solves this in openPMD keywords by explicitly having none keywords defined, but I agree this is generally useful.

If we add sufficent testing for it in a PR, then I think it is worth adding support for it. We should check in the tests that ADIOS2 and HDF5 backends handle this well.
If one of them has issues I would ping upstream for fixed as well, so the two have feature partity.

@ax3l
Copy link
Member

ax3l commented Nov 11, 2022

Let's continue this in a new issue for tracking?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants