-
Notifications
You must be signed in to change notification settings - Fork 21
Description
TL;DR First of all, thank you for the great project! IMHO, the regexp in the total_activity function should be revised to match the correct species. A possible solution is given below.
The Problem
Whilst reading the code of the total_activity
method of the Solution
class, I got a bit puzzeled by the regexp ("(^|[^A-Z])"+element
) used by the function to filter the species. Let's consider the following example solution:
# minimal example
from phreeqpython import PhreeqPython
pp = PhreeqPython()
solution = pp.add_solution_simple({'FeCl3':10,'HF':20},temperature=23)
print('\n'.join(list(solution.species_activities)))
print(solution.total_activity("F"))
The following species will result from this solution:
Cl-
F-
Fe(OH)2
Fe(OH)2+
Fe(OH)3
Fe(OH)3-
Fe(OH)4-
Fe+2
Fe+3
Fe2(OH)2+4
Fe3(OH)4+5
FeCl+
FeCl+2
FeCl2+
FeCl3
FeF+
FeF+2
FeF2+
FeF3
FeOH+
FeOH+2
H+
H2
H2O
HF
HF2-
O2
OH-
The regular expression mentioned above will match species that do not contain fluorine, because it will match any capital letter F
that isn't preceeded by a capital letter. To demonstrate the behaviour, let's modify the total_activity
function to generate some output:
def total_activity(self, element, units='mmol'):
""" Returns to total of any given species or element (SLOW!) """
total = 0
regexp = "(^|[^A-Z])"+element
for species, amount in self.species_activities.items():
if re.search(regexp, species):
print("Species: " + species + ", Element: " + element + ", Amount: " + str(amount))
total += convert_units(element, amount, to_units=units)
print("total: " + str(total))
return total
This will generate the following output:
Species: F-, Element: F, Amount: 9.295094985612013e-05
total: 0.09295094985612014
Species: Fe(OH)2, Element: F, Amount: 3.2252270278895312e-27
total: 0.09295094985612014
Species: Fe(OH)2+, Element: F, Amount: 7.479271053695942e-08
total: 0.09302574256665709
Species: Fe(OH)3, Element: F, Amount: 5.649525212777142e-13
total: 0.0930257431316096
Species: Fe(OH)3-, Element: F, Amount: 7.520387851310833e-36
total: 0.0930257431316096
Species: Fe(OH)4-, Element: F, Amount: 3.0418282013477725e-20
total: 0.09302574313160963
Species: Fe+2, Element: F, Amount: 4.0496998063442014e-10
total: 0.09302614810159027
Species: Fe+3, Element: F, Amount: 1.0374545118800632e-05
total: 0.1034006932203909
Species: Fe2(OH)2+4, Element: F, Amount: 4.2427618025354223e-10
total: 0.10340111749657116
Species: Fe3(OH)4+5, Element: F, Amount: 7.983518646246977e-15
total: 0.10340111750455468
Species: FeCl+, Element: F, Amount: 1.4089141257288415e-11
total: 0.10340113159369593
Species: FeCl+2, Element: F, Amount: 7.408158533773125e-06
total: 0.11080929012746905
Species: FeCl2+, Element: F, Amount: 8.889809364295044e-07
total: 0.11169827106389855
Species: FeCl3, Element: F, Amount: 2.240547420261682e-09
total: 0.11170051161131882
Species: FeF+, Element: F, Amount: 3.764234436318377e-13
total: 0.11170051198774227
Species: FeF+2, Element: F, Amount: 0.0014820322712886073
total: 1.5937327832763497
Species: FeF2+, Element: F, Amount: 0.005354467779888394
total: 6.948200563164744
Species: FeF3, Element: F, Amount: 0.0007834298040489542
total: 7.731630367213699
Species: FeOH+, Element: F, Amount: 7.052612626238075e-18
total: 7.731630367213706
Species: FeOH+2, Element: F, Amount: 3.8085223355509855e-06
total: 7.735438889549257
I don't think that that is the intended result of this method, because species like FeCl+2
will be matched, that do not contain any fluorine. However, it is not 100% clear to me what the intention of the function is. IMHO, all species that contain a fluorine atom should be listed (not just fluorine, because then we would call the activity("F")
method of the class). We can solve the issues as follows:
The (possible) Solution
First, lets address the matching of iron (Fe). If we simply modify the regular expression to not match any other element with lowercase letters, then this will result in the following expression: "(^|[^A-Z])"+element+"([^a-z])"
With the code above, this will generate the following output:
Species: F-, Element: F, Amount: 9.295094985612013e-05
total: 0.09295094985612014
Species: FeF+, Element: F, Amount: 3.764234436318377e-13
total: 0.09295095023254359
Species: FeF+2, Element: F, Amount: 0.0014820322712886073
total: 1.574983221521151
Species: FeF2+, Element: F, Amount: 0.005354467779888394
total: 6.929451001409546
Species: FeF3, Element: F, Amount: 0.0007834298040489542
total: 7.7128808054585
Much better, but still not quite what we want, because this will disregard the species HF2-
, which is discarded due to the first capturing expression, which prohibits capital letters before the element. Therefore, we can simplify the regex to the following form: element+"([^a-z])"
This will match all fluorine containing species and disregard the rest:
Species: F-, Element: F, Amount: 9.295094985612013e-05
total: 0.09295094985612014
Species: FeF+, Element: F, Amount: 3.764234436318377e-13
total: 0.09295095023254359
Species: FeF+2, Element: F, Amount: 0.0014820322712886073
total: 1.574983221521151
Species: FeF2+, Element: F, Amount: 0.005354467779888394
total: 6.929451001409546
Species: FeF3, Element: F, Amount: 0.0007834298040489542
total: 7.7128808054585
Species: HF2-, Element: F, Amount: 7.366691065371423e-07
total: 7.713617474565037
Therefore, the function should be written as:
def total_activity(self, element, units='mmol'):
""" Returns to total of any given species or element (SLOW!) """
total = 0
regexp = element+"([^a-z])"
for species, amount in self.species_activities.items():
if re.search(regexp, species):
total += convert_units(element, amount, to_units=units)
return total
Scope of the problem
The regular expression won't just fail with fluorine (F) and iron (Fe). This is simply due to the fact, that the single letter elements (like H, C, S, N, F, I, B, K, V, Y, W, P and U) will interfere with two letter elements that start with the same letter (e.g. F and Fe or C and Cr, Cd, Co, Cu, Ce, Cl, Cs, Cf etc.) when using the original regexp. Also, please correct me if I am wrong, but excluding capital letters and forcing to start at the beginning will not result in the desired result.
Please let me know your thoughts on this.