Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide ImmutableDict & ImmutableMultiDict factory methods which ban duplicate additions (Closes issue #53) #55

Merged
merged 8 commits into from
Jun 20, 2019

Conversation

qpwo
Copy link
Contributor

@qpwo qpwo commented Jun 13, 2019

Initial draft. I forgot some things in PR 54 that @ConstantineLignos pointed out; should I fix those in this PR?

@qpwo qpwo changed the title Provide ImmutableDict & ImmutableMultiDict factory methods which ban duplicate additions (Closes #53) Provide ImmutableDict & ImmutableMultiDict factory methods which ban duplicate additions (Closes issue #53) Jun 13, 2019
@berquist
Copy link
Member

#54 (comment) ? Yes

@codecov
Copy link

codecov bot commented Jun 14, 2019

Codecov Report

Merging #55 into master will increase coverage by 0.1%.
The diff coverage is 94.73%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master      #55     +/-   ##
=========================================
+ Coverage   90.66%   90.76%   +0.1%     
=========================================
  Files           6        6             
  Lines         621      639     +18     
=========================================
+ Hits          563      580     +17     
- Misses         58       59      +1
Impacted Files Coverage Δ
immutablecollections/_immutableset.py 90.47% <100%> (+0.12%) ⬆️
immutablecollections/_immutabledict.py 87.8% <93.33%> (+0.76%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1daaa9...14080a9. Read the comment docs.

@qpwo
Copy link
Contributor Author

qpwo commented Jun 19, 2019

Do we want immutablesetmultidict_from_unique_keys and immutablelistmultidict_from_unique_keys and associated kwargs?

@berquist
Copy link
Member

Probably not, because the point of the multidicts is to associate a single key with multiple values. A user would fully expect multiple keys to appear.

@qpwo
Copy link
Contributor Author

qpwo commented Jun 19, 2019

Oh I think this might be ready to merge then

Copy link
Member

@berquist berquist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you already make tests for ImmutableSet?

immutablecollections/_immutabledict.py Outdated Show resolved Hide resolved
@@ -100,7 +100,8 @@ def immutableset(
iteration_order.append(value)
elif forbid_duplicate_elements:
raise ValueError(
"Input collection has duplicate items and forbid_duplicate_elements=False"
"Input collection to `immutableset` contains "
"{!r} twice and forbid_duplicate_elements=True".format(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the dict case, print all the duplicate elements, not just the first one you find.

Co-Authored-By: Eric Berquist <eric.berquist@gmail.com>
@qpwo
Copy link
Contributor Author

qpwo commented Jun 19, 2019

Did you already make tests for ImmutableSet?

Yes that was in PR 54

@qpwo qpwo requested a review from berquist June 19, 2019 18:32
Copy link
Contributor

@ConstantineLignos ConstantineLignos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, one small thing.

immutablecollections/_immutabledict.py Outdated Show resolved Hide resolved
@gabbard
Copy link

gabbard commented Jun 20, 2019

These are the coding standards: https://github.com/isi-nlp/isi-flexnlp/blob/master/docs/coding_standards.rst

I don't see the "boolean arguments should always be keywords" rule there, but I remember discussing it. @berquist , can you add it?

@ConstantineLignos
Copy link
Contributor

@rgabbard Discussion was here, btw: https://github.com/isi-nlp/isi-flexnlp/issues/266#issuecomment-360886337 . (You can see me being quite confused about how this works in Python in that discussion :).)

@berquist berquist merged commit 9127fc2 into master Jun 20, 2019
@berquist berquist deleted the 53-dicts-warn-on-duplication branch June 20, 2019 15:24
keys: List[KT]
# `for x in dict` grabs keys, but `for x in pairs` grabs pairs, so we must branch
if isinstance(iterable, Mapping):
# duplicate keys are possible if input is e.g. a multidict
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping guarantees/requires that .keys return a set (and our MultiDicts don't implement Mapping)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What types should we check for then? dict_keys has some set behavior but fails isinstance:

In [1]: d = {i:i*2 for i in range(10000)}

In [2]: type(d.keys())
Out[2]: dict_keys

In [3]: isinstance(d.keys(), set)
Out[3]: False

In [4]: from typing import Mapping

In [5]: isinstance(d, Mapping)
Out[5]: True

In [6]: from typing import Set

In [7]: isinstance(d.keys(), Set)
Out[7]: False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could branch on whether iterable has .keys, or we could have a weaker debug message that just says some unknown duplicate keys were present

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is actually fine; it's just the comment which is wrong :-)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qpwo : We do want to include the duplicate elements. But we want to keep the "happy path" (which will be by far the most common) very fast; once we know there is an error, then we can do the work to figure out what the duplicate elements are (because no one cares if their crash takes an extra millisecond longer :-) )

) -> "ImmutableDict[KT, VT]":
"""
Create an immutable dictionary with the given mappings, but raise ValueError if
*iterable* contains the same item twice. More information in `immutabledict`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring here is not quite accurate - we throw on a duplicate key, not a duplicate item

@@ -92,16 +92,21 @@ def immutableset(
"iteration order, specify disable_order_check=True"
)

duplicated = set() # only used if forbid_duplicate_elements=True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason to spend time allocating this unless forbid_duplicate_elements is True

if forbid_duplicate_elements and duplicated:
raise ValueError(
"forbid_duplicate_elements=True, but some elements "
"occur multiple times in input: {}".format(duplicated)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer f-strings to .format

keys = [key for key, value in iterable]
seen: Set[KT] = set()
duplicated: Set[KT] = set()
for key in keys:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is really super-slow compared to the call to dict() hidden inside _RegularDictBackedImmutableDict (which is implemented in C). Instead, I'd suggest doing a two-part approach:

  • if forbid_duplicate_keys is True, then list-ify the input Iterable.
  • after we construct ret, then if forbid_duplicate_keys is True, compare then len of the ret to the len of the input iterable. If there is a mismatch, then do the work to construct the error message

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants