Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
Currently, meta
is used with when record_path
is not None. The logic is to extract both the record_path
and the meta
. For example:
data = [
{
"state": "Florida",
"shortname": "FL",
"info": {"governor": "Rick Scott", "year": 2014},
"counties": [
{"name": "Dade", "population": 12345},
{"name": "Broward", "population": 40000},
{"name": "Palm Beach", "population": 60000},
],
},
{
"state": "Ohio",
"shortname": "OH",
"info": {"governor": "John Kasich", "year": 2015},
"counties": [
{"name": "Summit", "population": 1234},
{"name": "Cuyahoga", "population": 1337},
],
},
]
result = pd.json_normalize(
data, "counties", ["state", "shortname", ["info", "governor"]]
)
result
name population state shortname info.governor
0 Dade 12345 Florida FL Rick Scott
1 Broward 40000 Florida FL Rick Scott
2 Palm Beach 60000 Florida FL Rick Scott
3 Summit 1234 Ohio OH John Kasich
4 Cuyahoga 1337 Ohio OH John Kasich
In the above example, pd.json_normalize
not only retrieves counties
, but also retrieves state
, shortname
and info.governor
.
When record_path
is not given, meta
is ignored, for example:
result = pd.json_normalize(
data, meta=["state", "shortname", ["info", "governor"]]
)
result
state shortname counties info.governor info.year
0 Florida FL [{'name': 'Dade', 'population': 12345}, {'name... Rick Scott 2014
1 Ohio OH [{'name': 'Summit', 'population': 1234}, {'nam... John Kasich 2015
Feature Description
I would expect when record_path
is None or an empty list, json_normalize
can only extract the meta
.
result = pd.json_normalize(
data, meta=["state", "shortname", ["info", "governor"]]
)
result
shortname state info.governor
0 FL Florida Rick Scott
1 OH Ohio John Kasich
The behavior can be summarized as:
record_path
is None,meta
is None: normalize all records.record_path
is not None,meta
is None: normalize onlyrecord_path
.record_path
is not None,meta
is not None: normalizerecord_path
andmeta
.record_path
is None,meta
is not None: normalize onlymeta
. [This request]
Alternative Solutions
The alternative solution is to use pd.json_normalize
to create the dataframe then select desired columns:
result = pd.json_normalize(data)[["shortname", "state", "info.governor"]]
shortname state info.governor
0 FL Florida Rick Scott
1 OH Ohio John Kasich
However, this is inconvenient that when you specify sep
in pd.json_normalize
, the columns name would also change as f"info{sep}governor"
:
pd.json_normalize(data, sep='-')[["shortname", "state", "info-governor"]]
The most similar feature is DataFrame.from_records columns
parameter, but it is not able to parse inner dictionary.
result = pd.DataFrame.from_records(data, columns=["state", "shortname", "info"])
result
state shortname info
0 Florida FL {'governor': 'Rick Scott'}
1 Ohio OH {'governor': 'John Kasich'}
df = pd.DataFrame.from_records(data, columns=["state", "shortname", ["info", "governor"]])
Traceback (most recent call last):
File "d:\sourcecode\pandas\test.py", line 20, in <module>
df = pd.DataFrame.from_records(data, columns=["state", "shortname", ["info", "governor"]])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Python\Python312\Lib\site-packages\pandas\core\frame.py", line 2491, in from_records
arrays, arr_columns = to_arrays(data, columns)
^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Python\Python312\Lib\site-packages\pandas\core\internals\construction.py", line 837, in to_arrays
arr, columns = _list_of_dict_to_arrays(data, columns)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Python\Python312\Lib\site-packages\pandas\core\internals\construction.py", line 924, in _list_of_dict_to_arrays
content = lib.dicts_to_array(data, list(columns))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "lib.pyx", line 401, in pandas._libs.lib.dicts_to_array
TypeError: unhashable type: 'list'