-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide ImmutableDict & ImmutableMultiDict factory methods which ban duplicate additions (Closes issue #53) #55
Conversation
#54 (comment) ? Yes |
Codecov Report
@@ Coverage Diff @@
## master #55 +/- ##
=========================================
+ Coverage 90.66% 90.76% +0.1%
=========================================
Files 6 6
Lines 621 639 +18
=========================================
+ Hits 563 580 +17
- Misses 58 59 +1
Continue to review full report at Codecov.
|
Do we want |
Probably not, because the point of the multidicts is to associate a single key with multiple values. A user would fully expect multiple keys to appear. |
Oh I think this might be ready to merge then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you already make tests for ImmutableSet
?
@@ -100,7 +100,8 @@ def immutableset( | |||
iteration_order.append(value) | |||
elif forbid_duplicate_elements: | |||
raise ValueError( | |||
"Input collection has duplicate items and forbid_duplicate_elements=False" | |||
"Input collection to `immutableset` contains " | |||
"{!r} twice and forbid_duplicate_elements=True".format(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like the dict case, print all the duplicate elements, not just the first one you find.
Co-Authored-By: Eric Berquist <eric.berquist@gmail.com>
Yes that was in PR 54 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, one small thing.
These are the coding standards: https://github.com/isi-nlp/isi-flexnlp/blob/master/docs/coding_standards.rst I don't see the "boolean arguments should always be keywords" rule there, but I remember discussing it. @berquist , can you add it? |
@rgabbard Discussion was here, btw: https://github.com/isi-nlp/isi-flexnlp/issues/266#issuecomment-360886337 . (You can see me being quite confused about how this works in Python in that discussion :).) |
keys: List[KT] | ||
# `for x in dict` grabs keys, but `for x in pairs` grabs pairs, so we must branch | ||
if isinstance(iterable, Mapping): | ||
# duplicate keys are possible if input is e.g. a multidict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mapping
guarantees/requires that .keys
return a set (and our MultiDict
s don't implement Mapping
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What types should we check for then? dict_keys
has some set
behavior but fails isinstance
:
In [1]: d = {i:i*2 for i in range(10000)}
In [2]: type(d.keys())
Out[2]: dict_keys
In [3]: isinstance(d.keys(), set)
Out[3]: False
In [4]: from typing import Mapping
In [5]: isinstance(d, Mapping)
Out[5]: True
In [6]: from typing import Set
In [7]: isinstance(d.keys(), Set)
Out[7]: False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could branch on whether iterable
has .keys
, or we could have a weaker debug message that just says some unknown duplicate keys were present
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is actually fine; it's just the comment which is wrong :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qpwo : We do want to include the duplicate elements. But we want to keep the "happy path" (which will be by far the most common) very fast; once we know there is an error, then we can do the work to figure out what the duplicate elements are (because no one cares if their crash takes an extra millisecond longer :-) )
) -> "ImmutableDict[KT, VT]": | ||
""" | ||
Create an immutable dictionary with the given mappings, but raise ValueError if | ||
*iterable* contains the same item twice. More information in `immutabledict` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring here is not quite accurate - we throw on a duplicate key, not a duplicate item
@@ -92,16 +92,21 @@ def immutableset( | |||
"iteration order, specify disable_order_check=True" | |||
) | |||
|
|||
duplicated = set() # only used if forbid_duplicate_elements=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason to spend time allocating this unless forbid_duplicate_elements
is True
if forbid_duplicate_elements and duplicated: | ||
raise ValueError( | ||
"forbid_duplicate_elements=True, but some elements " | ||
"occur multiple times in input: {}".format(duplicated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer f-strings to .format
keys = [key for key, value in iterable] | ||
seen: Set[KT] = set() | ||
duplicated: Set[KT] = set() | ||
for key in keys: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is really super-slow compared to the call to dict()
hidden inside _RegularDictBackedImmutableDict
(which is implemented in C). Instead, I'd suggest doing a two-part approach:
- if
forbid_duplicate_keys
isTrue
, thenlist
-ify the inputIterable
. - after we construct
ret
, then ifforbid_duplicate_keys
isTrue
, compare thenlen
of theret
to thelen
of the input iterable. If there is a mismatch, then do the work to construct the error message
Initial draft. I forgot some things in PR 54 that @ConstantineLignos pointed out; should I fix those in this PR?