Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6.6.1 #427

Merged
merged 4 commits into from
Oct 17, 2023
Merged

6.6.1 #427

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# DeepDiff Change log

- v6-6-1
- Fix for [DeepDiff raises decimal exception when using significant digits](https://github.com/seperman/deepdiff/issues/426)
- Introducing group_by_sort_key
- Adding group_by 2D. For example `group_by=['last_name', 'zip_code']`
- v6-6-0
- Numpy 2.0 support
- Adding [Delta.to_flat_dicts](https://zepworks.com/deepdiff/current/serialization.html#delta-serialize-to-flat-dictionaries)
Expand Down
16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# DeepDiff v 6.6.0
# DeepDiff v 6.6.1

![Downloads](https://img.shields.io/pypi/dm/deepdiff.svg?style=flat)
![Python Versions](https://img.shields.io/pypi/pyversions/deepdiff.svg?style=flat)
Expand All @@ -17,15 +17,21 @@

Tested on Python 3.7+ and PyPy3.

- **[Documentation](https://zepworks.com/deepdiff/6.6.0/)**
- **[Documentation](https://zepworks.com/deepdiff/6.6.1/)**

## What is new?

Please check the [ChangeLog](CHANGELOG.md) file for the detailed information.

DeepDiff 6-6-1
- Fix for [DeepDiff raises decimal exception when using significant digits](https://github.com/seperman/deepdiff/issues/426)
- Introducing group_by_sort_key
- Adding group_by 2D. For example `group_by=['last_name', 'zip_code']`


DeepDiff 6-6-0

- [Serialize To Flat Dicts]()
- [Serialize To Flat Dicts](https://zepworks.com/deepdiff/current/serialization.html#delta-to-flat-dicts-label)
- [NumPy 2.0 compatibility](https://github.com/seperman/deepdiff/pull/422) by [William Jamieson](https://github.com/WilliamJamieson)

DeepDiff 6-5-0
Expand Down Expand Up @@ -101,11 +107,11 @@ Thank you!

How to cite this library (APA style):

Dehpour, S. (2023). DeepDiff (Version 6.6.0) [Software]. Available from https://github.com/seperman/deepdiff.
Dehpour, S. (2023). DeepDiff (Version 6.6.1) [Software]. Available from https://github.com/seperman/deepdiff.

How to cite this library (Chicago style):

Dehpour, Sep. 2023. DeepDiff (version 6.6.0).
Dehpour, Sep. 2023. DeepDiff (version 6.6.1).

# Authors

Expand Down
2 changes: 1 addition & 1 deletion deepdiff/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""This module offers the DeepDiff, DeepSearch, grep, Delta and DeepHash classes."""
# flake8: noqa
__version__ = '6.6.0'
__version__ = '6.6.1'
import logging

if __name__ == '__main__':
Expand Down
63 changes: 55 additions & 8 deletions deepdiff/diff.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ def __init__(self,
exclude_types=None,
get_deep_distance=False,
group_by=None,
group_by_sort_key=None,
hasher=None,
hashes=None,
ignore_encoding_errors=False,
Expand Down Expand Up @@ -170,7 +171,7 @@ def __init__(self,
"ignore_private_variables, ignore_nan_inequality, number_to_string_func, verbose_level, "
"view, hasher, hashes, max_passes, max_diffs, zip_ordered_iterables, "
"cutoff_distance_for_pairs, cutoff_intersection_for_pairs, log_frequency_in_sec, cache_size, "
"cache_tuning_sample_size, get_deep_distance, group_by, cache_purge_level, "
"cache_tuning_sample_size, get_deep_distance, group_by, group_by_sort_key, cache_purge_level, "
"math_epsilon, iterable_compare_func, _original_type, "
"ignore_order_func, custom_operators, encodings, ignore_encoding_errors, "
"_parameters and _shared_parameters.") % ', '.join(kwargs.keys()))
Expand Down Expand Up @@ -216,6 +217,14 @@ def __init__(self,
self.hasher = hasher
self.cache_tuning_sample_size = cache_tuning_sample_size
self.group_by = group_by
if callable(group_by_sort_key):
self.group_by_sort_key = group_by_sort_key
elif group_by_sort_key:
def _group_by_sort_key(x):
return x[group_by_sort_key]
self.group_by_sort_key = _group_by_sort_key
else:
self.group_by_sort_key = None
self.encodings = encodings
self.ignore_encoding_errors = ignore_encoding_errors

Expand Down Expand Up @@ -1592,26 +1601,64 @@ def _get_view_results(self, view):
raise ValueError(INVALID_VIEW_MSG.format(view))
return result

@staticmethod
def _get_key_for_group_by(row, group_by, item_name):
try:
return row.pop(group_by)
except KeyError:
logger.error("Unable to group {} by {}. The key is missing in {}".format(item_name, group_by, row))
raise

def _group_iterable_to_dict(self, item, group_by, item_name):
"""
Convert a list of dictionaries into a dictionary of dictionaries
where the key is the value of the group_by key in each dictionary.
"""
group_by_level2 = None
if isinstance(group_by, (list, tuple)):
group_by_level1 = group_by[0]
if len(group_by) > 1:
group_by_level2 = group_by[1]
else:
group_by_level1 = group_by
if isinstance(item, Iterable) and not isinstance(item, Mapping):
result = {}
item_copy = deepcopy(item)
for row in item_copy:
if isinstance(row, Mapping):
try:
key = row.pop(group_by)
except KeyError:
logger.error("Unable to group {} by {}. The key is missing in {}".format(item_name, group_by, row))
raise
result[key] = row
key1 = self._get_key_for_group_by(row, group_by_level1, item_name)
if group_by_level2:
key2 = self._get_key_for_group_by(row, group_by_level2, item_name)
if key1 not in result:
result[key1] = {}
if self.group_by_sort_key:
if key2 not in result[key1]:
result[key1][key2] = []
result_key1_key2 = result[key1][key2]
if row not in result_key1_key2:
result_key1_key2.append(row)
else:
result[key1][key2] = row
else:
if self.group_by_sort_key:
if key1 not in result:
result[key1] = []
if row not in result[key1]:
result[key1].append(row)
else:
result[key1] = row
else:
msg = "Unable to group {} by {} since the item {} is not a dictionary.".format(item_name, group_by, row)
msg = "Unable to group {} by {} since the item {} is not a dictionary.".format(item_name, group_by_level1, row)
logger.error(msg)
raise ValueError(msg)
if self.group_by_sort_key:
if group_by_level2:
for key1, row1 in result.items():
for key2, row in row1.items():
row.sort(key=self.group_by_sort_key)
else:
for key, row in result.items():
row.sort(key=self.group_by_sort_key)
return result
msg = "Unable to group {} by {}".format(item_name, group_by)
logger.error(msg)
Expand Down
10 changes: 8 additions & 2 deletions deepdiff/helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import string
import time
from ast import literal_eval
from decimal import Decimal, localcontext
from decimal import Decimal, localcontext, InvalidOperation as InvalidDecimalOperation
from collections import namedtuple
from itertools import repeat
from ordered_set import OrderedSet
Expand Down Expand Up @@ -394,7 +394,13 @@ def number_to_string(number, significant_digits, number_format_notation="f"):
# Precision = number of integer digits + significant_digits
# Using number//1 to get the integer part of the number
ctx.prec = len(str(abs(number // 1))) + significant_digits
number = number.quantize(Decimal('0.' + '0' * significant_digits))
try:
number = number.quantize(Decimal('0.' + '0' * significant_digits))
except InvalidDecimalOperation:
# Sometimes rounding up causes a higher precision to be needed for the quantize operation
# For example '999.99999999' will become '1000.000000' after quantize
ctx.prec += 1
number = number.quantize(Decimal('0.' + '0' * significant_digits))
elif isinstance(number, only_complex_number):
# Case for complex numbers.
number = number.__class__(
Expand Down
1 change: 1 addition & 0 deletions deepdiff/serialization.py
Original file line number Diff line number Diff line change
Expand Up @@ -537,6 +537,7 @@ def _serialize_decimal(value):
JSON_CONVERTOR = {
decimal.Decimal: _serialize_decimal,
ordered_set.OrderedSet: list,
set: list,
type: lambda x: x.__name__,
bytes: lambda x: x.decode('utf-8'),
datetime.datetime: lambda x: x.isoformat(),
Expand Down
89 changes: 87 additions & 2 deletions docs/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -148,9 +148,24 @@ Object attribute added:
Group By
--------

group_by can be used when dealing with list of dictionaries to convert them to group them by value defined in group_by. The common use case is when reading data from a flat CSV and primary key is one of the columns in the CSV. We want to use the primary key to group the rows instead of CSV row number.
group_by can be used when dealing with the list of dictionaries. It converts them from lists to a single dictionary with the key defined by group_by. The common use case is when reading data from a flat CSV, and the primary key is one of the columns in the CSV. We want to use the primary key instead of the CSV row number to group the rows. The group_by can do 2D group_by by passing a list of 2 keys.

Example:
For example:
>>> [
... {'id': 'AA', 'name': 'Joe', 'last_name': 'Nobody'},
... {'id': 'BB', 'name': 'James', 'last_name': 'Blue'},
... {'id': 'CC', 'name': 'Mike', 'last_name': 'Apple'},
... ]

Becomes:
>>> t1 = {
... 'AA': {'name': 'Joe', 'last_name': 'Nobody'},
... 'BB': {'name': 'James', 'last_name': 'Blue'},
... 'CC': {'name': 'Mike', 'last_name': 'Apple'},
... }


With that in mind, let's take a look at the following:
>>> from deepdiff import DeepDiff
>>> t1 = [
... {'id': 'AA', 'name': 'Joe', 'last_name': 'Nobody'},
Expand Down Expand Up @@ -187,5 +202,75 @@ Now we use group_by='id':
>>> diff['values_changed'][0].up.up.t1
{'AA': {'name': 'Joe', 'last_name': 'Nobody'}, 'BB': {'name': 'James', 'last_name': 'Blue'}, 'CC': {'name': 'Mike', 'last_name': 'Apple'}}

2D Example:
>>> from pprint import pprint
>>> from deepdiff import DeepDiff
>>>
>>> t1 = [
... {'id': 'AA', 'name': 'Joe', 'last_name': 'Nobody'},
... {'id': 'BB', 'name': 'James', 'last_name': 'Blue'},
... {'id': 'BB', 'name': 'Jimmy', 'last_name': 'Red'},
... {'id': 'CC', 'name': 'Mike', 'last_name': 'Apple'},
... ]
>>>
>>> t2 = [
... {'id': 'AA', 'name': 'Joe', 'last_name': 'Nobody'},
... {'id': 'BB', 'name': 'James', 'last_name': 'Brown'},
... {'id': 'CC', 'name': 'Mike', 'last_name': 'Apple'},
... ]
>>>
>>> diff = DeepDiff(t1, t2, group_by=['id', 'name'])
>>> pprint(diff)
{'dictionary_item_removed': [root['BB']['Jimmy']],
'values_changed': {"root['BB']['James']['last_name']": {'new_value': 'Brown',
'old_value': 'Blue'}}}

.. _group_by_sort_key_label:

Group By - Sort Key
-------------------

group_by_sort_key is used to define how dictionaries are sorted if multiple ones fall under one group. When this parameter is used, group_by converts the lists of dictionaries into a dictionary of keys to lists of dictionaries. Then, group_by_sort_key is used to sort between the list.

For example, there are duplicate id values. If we only use group_by='id', one of the dictionaries with id of 'BB' will overwrite the other. However, if we also set group_by_sort_key='name', we keep both dictionaries with the id of 'BB'.

Example:

[{'id': 'AA', 'int_id': 2, 'last_name': 'Nobody', 'name': 'Joe'},
{'id': 'BB', 'int_id': 20, 'last_name': 'Blue', 'name': 'James'},
{'id': 'BB', 'int_id': 3, 'last_name': 'Red', 'name': 'Jimmy'},
{'id': 'CC', 'int_id': 4, 'last_name': 'Apple', 'name': 'Mike'}]


Becomes:
{'AA': [{'int_id': 2, 'last_name': 'Nobody', 'name': 'Joe'}],
'BB': [{'int_id': 20, 'last_name': 'Blue', 'name': 'James'},
{'int_id': 3, 'last_name': 'Red', 'name': 'Jimmy'}],
'CC': [{'int_id': 4, 'last_name': 'Apple', 'name': 'Mike'}]}


Example of using group_by_sort_key
>>> t1 = [
... {'id': 'AA', 'name': 'Joe', 'last_name': 'Nobody', 'int_id': 2},
... {'id': 'BB', 'name': 'James', 'last_name': 'Blue', 'int_id': 20},
... {'id': 'BB', 'name': 'Jimmy', 'last_name': 'Red', 'int_id': 3},
... {'id': 'CC', 'name': 'Mike', 'last_name': 'Apple', 'int_id': 4},
... ]
>>>
>>> t2 = [
... {'id': 'AA', 'name': 'Joe', 'last_name': 'Nobody', 'int_id': 2},
... {'id': 'BB', 'name': 'James', 'last_name': 'Brown', 'int_id': 20},
... {'id': 'CC', 'name': 'Mike', 'last_name': 'Apple', 'int_id': 4},
... ]
>>>
>>> diff = DeepDiff(t1, t2, group_by='id', group_by_sort_key='name')
>>>
>>> pprint(diff)
{'iterable_item_removed': {"root['BB'][1]": {'int_id': 3,
'last_name': 'Red',
'name': 'Jimmy'}},
'values_changed': {"root['BB'][0]['last_name']": {'new_value': 'Brown',
'old_value': 'Blue'}}}


Back to :doc:`/index`
8 changes: 8 additions & 0 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,14 @@ Changelog

DeepDiff Changelog

- v6-6-1

- Fix for `DeepDiff raises decimal exception when using significant
digits <https://github.com/seperman/deepdiff/issues/426>`__
- Introducing group_by_sort_key
- Adding group_by 2D. For example
``group_by=['last_name', 'zip_code']``

- v6-6-0

- Numpy 2.0 support
Expand Down
4 changes: 2 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,9 @@
# built documents.
#
# The short X.Y version.
version = '6.6.0'
version = '6.6.1'
# The full version, including alpha/beta/rc tags.
release = '6.6.0'
release = '6.6.1'

load_dotenv(override=True)
DOC_VERSION = os.environ.get('DOC_VERSION', version)
Expand Down
7 changes: 5 additions & 2 deletions docs/diff_doc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,11 @@ include_obj_callback_strict: function, default = None
get_deep_distance: Boolean, default = False
:ref:`get_deep_distance_label` will get you the deep distance between objects. The distance is a number between 0 and 1 where zero means there is no diff between the 2 objects and 1 means they are very different. Note that this number should only be used to compare the similarity of 2 objects and nothing more. The algorithm for calculating this number may or may not change in the future releases of DeepDiff.

group_by: String, default=None
:ref:`group_by_label` can be used when dealing with list of dictionaries to convert them to group them by value defined in group_by. The common use case is when reading data from a flat CSV and primary key is one of the columns in the CSV. We want to use the primary key to group the rows instead of CSV row number.
group_by: String or a list of size 2, default=None
:ref:`group_by_label` can be used when dealing with the list of dictionaries. It converts them from lists to a single dictionary with the key defined by group_by. The common use case is when reading data from a flat CSV, and the primary key is one of the columns in the CSV. We want to use the primary key instead of the CSV row number to group the rows. The group_by can do 2D group_by by passing a list of 2 keys.

group_by_sort_key: String or a function
:ref:`group_by_sort_key_label` is used to define how dictionaries are sorted if multiple ones fall under one group. When this parameter is used, group_by converts the lists of dictionaries into a dictionary of keys to lists of dictionaries. Then, :ref:`group_by_sort_key_label` is used to sort between the list.

hasher: default = DeepHash.sha256hex
Hash function to be used. If you don't want SHA256, you can use your own hash function
Expand Down
11 changes: 10 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
contain the root `toctree` directive.
DeepDiff 6.6.0 documentation!
DeepDiff 6.6.1 documentation!
=============================

*******
Expand All @@ -31,6 +31,15 @@ The DeepDiff library includes the following modules:
What Is New
***********

DeepDiff 6-6-1
--------------

- Fix for `DeepDiff raises decimal exception when using significant
digits <https://github.com/seperman/deepdiff/issues/426>`__
- Introducing group_by_sort_key
- Adding group_by 2D. For example
``group_by=['last_name', 'zip_code']``

DeepDiff 6-6-0
--------------

Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 6.6.0
current_version = 6.6.1
commit = True
tag = True
tag_name = {new_version}
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
if os.environ.get('USER', '') == 'vagrant':
del os.link

version = '6.6.0'
version = '6.6.1'


def get_reqs(filename):
Expand Down
Loading
Loading