-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errant parsing of parameters that can be a list of integers #396
Comments
This comes from the generic parsing routine get_range_vector. So At the moment a limitation of get_range_vector is that it can only robustly return positive integers. In fact this is (at the moment) all we need it to do. But the header should state this (trap for it?). We could enforce no white space around a range separator. That could break old input files - so I'd like a discussion with @mostofi and @giovannipizzi @jimustafa do you have a use case in which this lead to an error - or is it just from a defensive coding pov? |
I encountered this when working to write a parser for the WIN file (see aiidateam/aiida-wannier90#99 for some context). Splitting on the valid separators should give the same result, so the fact that I would be surprised if there are any WIN files out in the wild that actually rely on |
Another option (partially mentioned above): can we state that the syntax for lists of integers only holds for positive integers? Is there any field where this is not the case? If not, we can say that any negative integer is invalid, ensure that Otherwise (or, in addition) we can also enforce that one cannot put whitespaces around the dashes, but while I see that |
In Python, one might parse >>> re.split(r'[ ,;]', '2, 6-8, 12')
['2', '', '6-8', '', '12'] Here, we get 3 types of elements, ones that are empty (can be ignored), ones that can be parsed as integers, and ones that look like a range, and could be further parsed into a range of integers. In the other cases, we get: >>> re.split(r'[ ,;]', '2, 6 -8, 12')
['2', '', '6', '-8', '', '12']
>>> re.split(r'[ ,;]', '2, 6 - 8, 12')
['2', '', '6', '-', '8', '', '12'] For these, one needs to look at a sequence of elements to try and interpret them as a range. It seems that when treating whitespace as a list separator, the only valid-looking range element is one that does not include whitespace. Otherwise, there needs to be an exception to the rule that whitespace separates elements, which only works when negative integers are not allowed. |
I scanned through the results in https://github.com/search?q=path%3A*.win+exclude_bands&type=code and did find a couple of examples that rely on parsing |
For parameters that can be specified as a list of integers, such as
exclude_bands
andselect_projections
, there is a convenient notation that allows for individual integers and integer ranges to be specified together. For example (from thewannier90-v3.1.0
manual),will yield, in the
nnkp
file, the following:It appears that spaces around the
-
do not matter, so thatgives the same result. Clearly, a negative integer makes no sense here, but it still seems that this should be a parsing error.
The suggestion here is that only integer ranges where there are no spaces around
-
(even6 - 8
) should throw an error.The text was updated successfully, but these errors were encountered: