Skip to content

ENH: Preserve key order when passing list of dicts to DataFrame on py 3.6+ #27309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 53 commits into from Jul 17, 2019
Merged
Show file tree
Hide file tree
Changes from 41 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
afa72b4
ENH: Support new case of implied column ordering in Dataframe()
pilkibun Jul 9, 2019
8a4113c
Safer
pilkibun Jul 9, 2019
408ad8b
Restrict to Index case
pilkibun Jul 9, 2019
b732096
Fix tests
pilkibun Jul 9, 2019
be57fd9
Style
pilkibun Jul 9, 2019
717716b
Fix test
pilkibun Jul 9, 2019
e9d4989
rename
pilkibun Jul 10, 2019
63adbfe
Restrict to PY37
pilkibun Jul 10, 2019
0ed89ff
Style
pilkibun Jul 10, 2019
0a48016
Restrict to PY36
pilkibun Jul 10, 2019
4b73536
Work around fake test failure on PY35
pilkibun Jul 10, 2019
eb64d31
Style
pilkibun Jul 10, 2019
b5db0bc
fix test
pilkibun Jul 10, 2019
76d7d54
ENH: treat dict like OrderedDict for PY36+ in dataframe constructor
pilkibun Jul 11, 2019
4571e16
Restore frame.py
pilkibun Jul 11, 2019
71f3c79
restore test_normalize.py
pilkibun Jul 11, 2019
a3987e7
Skip some json tests on Py35
pilkibun Jul 11, 2019
e72b666
fix test
pilkibun Jul 11, 2019
104c2a7
black
pilkibun Jul 11, 2019
e8c27e5
clean isinstance check
pilkibun Jul 11, 2019
b38f65a
messages
pilkibun Jul 11, 2019
32e5b00
Update test after behavior change
pilkibun Jul 11, 2019
864a116
Ignore column order on py35
pilkibun Jul 11, 2019
9cb4362
clean
pilkibun Jul 11, 2019
355979e
fix
pilkibun Jul 11, 2019
887f201
fix
pilkibun Jul 11, 2019
d65a085
whatsnew
pilkibun Jul 11, 2019
2e82473
fix issue ref
pilkibun Jul 11, 2019
21ec5a7
fix header type of unrelated issue
pilkibun Jul 11, 2019
51ff714
whatsnew
pilkibun Jul 11, 2019
92d83ea
checks
pilkibun Jul 11, 2019
85da582
Update pandas/tests/indexing/test_indexing.py
Jul 12, 2019
4f9228c
Update pandas/tests/indexing/test_indexing.py
Jul 12, 2019
61d833a
Update doc/source/whatsnew/v0.25.0.rst
Jul 12, 2019
79346de
Update doc/source/whatsnew/v0.25.0.rst
Jul 12, 2019
ddcce3e
remove comment
pilkibun Jul 12, 2019
2f22ec9
Checks
pilkibun Jul 12, 2019
3dcacd2
Add import
pilkibun Jul 12, 2019
c28e2fd
CI
pilkibun Jul 12, 2019
4d52802
fix tests
pilkibun Jul 12, 2019
5371de5
CI
pilkibun Jul 11, 2019
9afdec3
whatsnew
pilkibun Jul 12, 2019
e1f5f6b
comment
pilkibun Jul 12, 2019
b8d8e28
whatsnew
pilkibun Jul 12, 2019
807e341
comment
pilkibun Jul 12, 2019
209c922
checks
pilkibun Jul 12, 2019
e3dfa45
docstring
pilkibun Jul 12, 2019
e0749fe
whatsnew
pilkibun Jul 13, 2019
4f815cd
doc comments
jorisvandenbossche Jul 15, 2019
60236e5
typo
pilkibun Jul 15, 2019
f4e6309
whatsnew
pilkibun Jul 15, 2019
10024c1
document parameters
pilkibun Jul 15, 2019
0d194f1
remove wrong description
pilkibun Jul 16, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 39 additions & 1 deletion doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -400,7 +400,7 @@ of ``object`` dtype. :attr:`Series.str` will now infer the dtype data *within* t
.. _whatsnew_0250.api_breaking.groupby_categorical:

Categorical dtypes are preserved during groupby
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Previously, columns that were categorical, but not the groupby key(s) would be converted to ``object`` dtype during groupby operations. Pandas now will preserve these dtypes. (:issue:`18502`)

Expand Down Expand Up @@ -741,6 +741,44 @@ consistent with NumPy and the rest of pandas (:issue:`21801`).
cat.argsort()
cat[cat.argsort()]

.. _whatsnew_0250.api_breaking.list_of_dict:

Column order is preserved when passing a list of dicts to DataFrame
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Starting with Python 3.7 the key-order of `dict` is `guaranteed <https://mail.python.org/pipermail/python-dev/2017-December/151283.html>`_. In practice, this has been true since
Python 3.6. The DataFrame constructor now treats a list of dicts in the same way as
it has a list of `OrderedDict` since v0.19.0. This change applies only when pandas is running
on python>=3.6 (:issue:`27309`). As a consequence, the column order produced by `DataFrame()`
in such cases has changed.

.. code-block:: ipython

In [1]: data = [
...: {'name': 'Joe', 'state': 'NY', 'age': 18},
...: {'name': 'Jane', 'state': 'KY', 'age': 19}
...: ]

*Previous behavior* (lexicographically sorted)

.. code-block:: ipython

In [1]: pd.DataFrame(data)
Out[1]:
age name state
0 18 Joe NY
1 19 Jane KY

*New behavior*: preserve order of the dicts

.. code-block:: ipython

In [2]: pd.DataFrame(data)
Out[2]:
name state age
0 Joe NY 18
1 Jane KY 19

.. _whatsnew_0250.api_breaking.deps:

Increased minimum versions for dependencies
Expand Down
5 changes: 3 additions & 2 deletions pandas/core/internals/construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from pandas._libs import lib
from pandas._libs.tslibs import IncompatibleFrequency, OutOfBoundsDatetime
import pandas.compat as compat
from pandas.compat import raise_with_traceback
from pandas.compat import PY36, raise_with_traceback

from pandas.core.dtypes.cast import (
construct_1d_arraylike_from_scalar,
Expand Down Expand Up @@ -538,7 +538,8 @@ def _list_of_series_to_arrays(data, columns, coerce_float=False, dtype=None):
def _list_of_dict_to_arrays(data, columns, coerce_float=False, dtype=None):
if columns is None:
gen = (list(x.keys()) for x in data)
sort = not any(isinstance(d, OrderedDict) for d in data)
types = (dict, OrderedDict) if PY36 else OrderedDict
sort = not any(isinstance(d, types) for d in data)
columns = lib.fast_unique_multiple_list_gen(gen, sort=sort)

# assure that they are of the base dict class and not of derived
Expand Down
22 changes: 21 additions & 1 deletion pandas/tests/frame/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -1119,7 +1119,7 @@ def test_constructor_generator(self):
expected = DataFrame({0: range(10), 1: "a"})
tm.assert_frame_equal(result, expected, check_dtype=False)

def test_constructor_list_of_dicts(self):
def test_constructor_list_of_odicts(self):
data = [
OrderedDict([["a", 1.5], ["b", 3], ["c", 4], ["d", 6]]),
OrderedDict([["a", 1.5], ["b", 3], ["d", 6]]),
Expand Down Expand Up @@ -1340,6 +1340,26 @@ def test_constructor_list_of_namedtuples(self):
result = DataFrame(tuples, columns=["y", "z"])
tm.assert_frame_equal(result, expected)

def test_constructor_list_of_dict_order(self):
# GH10056
data = [
{"First": 1, "Second": 4, "Third": 7, "Fourth": 10},
{"Second": 5, "First": 2, "Fourth": 11, "Third": 8},
{"Second": 6, "First": 3, "Fourth": 12, "Third": 9, "YYY": 14, "XXX": 13},
]
expected = DataFrame(
{
"First": [1, 2, 3],
"Second": [4, 5, 6],
"Third": [7, 8, 9],
"Fourth": [10, 11, 12],
"YYY": [None, None, 14],
"XXX": [None, None, 13],
}
)
result = DataFrame(data)
tm.assert_frame_equal(result, expected, check_like=not PY36)

def test_constructor_orient(self, float_string_frame):
data_dict = float_string_frame.T._series
recons = DataFrame.from_dict(data_dict, orient="index")
Expand Down
8 changes: 6 additions & 2 deletions pandas/tests/indexing/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
import numpy as np
import pytest

from pandas.compat import PY36

from pandas.core.dtypes.common import is_float_dtype, is_integer_dtype

import pandas as pd
Expand Down Expand Up @@ -230,8 +232,10 @@ def test_setitem_dtype_upcast(self):
assert df["c"].dtype == np.float64

df.loc[0, "c"] = "foo"
expected = DataFrame([{"a": 1, "c": "foo"}, {"a": 3, "b": 2, "c": np.nan}])
tm.assert_frame_equal(df, expected)
expected = DataFrame(
[{"a": 1, "b": np.nan, "c": "foo"}, {"a": 3, "b": 2, "c": np.nan}]
)
tm.assert_frame_equal(df, expected, check_like=not PY36)

# GH10280
df = DataFrame(
Expand Down
23 changes: 13 additions & 10 deletions pandas/tests/io/json/test_normalize.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
import numpy as np
import pytest

from pandas.compat import PY36

from pandas import DataFrame, Index
import pandas.util.testing as tm

Expand Down Expand Up @@ -351,9 +353,9 @@ def test_non_ascii_key(self):
).decode("utf8")

testdata = {
b"\xc3\x9cnic\xc3\xb8de".decode("utf8"): [0, 1],
"sub.A": [1, 3],
"sub.B": [2, 4],
b"\xc3\x9cnic\xc3\xb8de".decode("utf8"): [0, 1],
}
expected = DataFrame(testdata)

Expand All @@ -366,21 +368,21 @@ def test_missing_field(self, author_missing_data):
ex_data = [
{
"info": np.nan,
"author_name.first": np.nan,
"author_name.last_name": np.nan,
"info.created_at": np.nan,
"info.last_updated": np.nan,
"author_name.first": np.nan,
"author_name.last_name": np.nan,
},
{
"info": None,
"author_name.first": "Jane",
"author_name.last_name": "Doe",
"info.created_at": "11/08/1993",
"info.last_updated": "26/05/2012",
"author_name.first": "Jane",
"author_name.last_name": "Doe",
},
]
expected = DataFrame(ex_data)
tm.assert_frame_equal(result, expected)
tm.assert_frame_equal(result, expected, check_like=not PY36)

@pytest.mark.parametrize(
"max_level,expected",
Expand Down Expand Up @@ -508,12 +510,13 @@ def test_missing_meta(self, missing_metadata):
data=missing_metadata, record_path="addresses", meta="name", errors="ignore"
)
ex_data = [
["Massillon", 9562, "OH", "Morris St.", 44646, "Alice"],
["Elizabethton", 8449, "TN", "Spring St.", 37643, np.nan],
[9562, "Morris St.", "Massillon", "OH", 44646, "Alice"],
[8449, "Spring St.", "Elizabethton", "TN", 37643, np.nan],
]
columns = ["city", "number", "state", "street", "zip", "name"]
columns = ["number", "street", "city", "state", "zip", "name"]
expected = DataFrame(ex_data, columns=columns)
tm.assert_frame_equal(result, expected)
tm.assert_frame_equal(result, expected, check_like=not PY36)

def test_donot_drop_nonevalues(self):
# GH21356
Expand Down Expand Up @@ -684,7 +687,7 @@ def test_with_large_max_level(self):
"CreatedBy.user.family_tree.father.name": "Father001",
"CreatedBy.user.family_tree.father.father.Name": "Father002",
"CreatedBy.user.family_tree.father.father.father.name": "Father003",
"CreatedBy.user.family_tree.father.father.father.father.Name": "Father004",
"CreatedBy.user.family_tree.father.father.father.father.Name": "Father004", # noqa: E501
}
]
output = nested_to_record(input_data, max_level=max_level)
Expand Down