Skip to content

DEPR: DataFrame.get_dtype_counts #27145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jul 3, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions doc/source/getting_started/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1968,11 +1968,11 @@ dtype of the column will be chosen to accommodate all of the data types
pd.Series([1, 2, 3, 6., 'foo'])

The number of columns of each type in a ``DataFrame`` can be found by calling
:meth:`~DataFrame.get_dtype_counts`.
``DataFrame.dtypes.value_counts()``.

.. ipython:: python

dft.get_dtype_counts()
dft.dtypes.value_counts()

Numeric dtypes will propagate and can coexist in DataFrames.
If a dtype is passed (either directly via the ``dtype`` keyword, a passed ``ndarray``,
Expand Down
2 changes: 1 addition & 1 deletion doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3754,7 +3754,7 @@ defaults to `nan`.
store.append('df_mixed', df_mixed, min_itemsize={'values': 50})
df_mixed1 = store.select('df_mixed')
df_mixed1
df_mixed1.get_dtype_counts()
df_mixed1.dtypes.value_counts()

# we have provided a minimum string column size
store.root.df_mixed.table
Expand Down
2 changes: 1 addition & 1 deletion doc/source/user_guide/missing_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ pandas objects provide compatibility between ``NaT`` and ``NaN``.
df2
df2.loc[['a', 'c', 'h'], ['one', 'timestamp']] = np.nan
df2
df2.get_dtype_counts()
df2.dtypes.value_counts()

.. _missing.inserting:

Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.10.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ You can now store ``datetime64`` in data columns
store.append('df_mixed', df_mixed)
df_mixed1 = store.select('df_mixed')
df_mixed1
df_mixed1.get_dtype_counts()
df_mixed1.dtypes.value_counts()

You can pass ``columns`` keyword to select to filter a list of the return
columns, this is equivalent to passing a
Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.11.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -296,7 +296,7 @@ Furthermore ``datetime64[ns]`` columns are created by default, when passed datet
df

# datetime64[ns] out of the box
df.get_dtype_counts()
df.dtypes.value_counts()

# use the traditional nan, which is mapped to NaT internally
df.loc[df.index[2:4], ['A', 'timestamp']] = np.nan
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -762,6 +762,7 @@ Other deprecations
- :meth:`Series.put` is deprecated. (:issue:`18262`)
- :meth:`Index.item` and :meth:`Series.item` is deprecated. (:issue:`18262`)
- :meth:`Index.contains` is deprecated. Use ``key in index`` (``__contains__``) instead (:issue:`17753`).
- :meth:`DataFrame.get_dtype_counts` is deprecated. (:issue:`18262`)

.. _whatsnew_0250.prior_deprecations:

Expand Down
6 changes: 3 additions & 3 deletions pandas/core/computation/expressions.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,11 +79,11 @@ def _can_use_numexpr(op, op_str, a, b, dtype_check):
# check for dtype compatibility
dtypes = set()
for o in [a, b]:
if hasattr(o, 'get_dtype_counts'):
s = o.get_dtype_counts()
if hasattr(o, 'dtypes'):
s = o.dtypes.value_counts()
if len(s) > 1:
return False
dtypes |= set(s.index)
dtypes |= set(s.index.astype(str))
elif isinstance(o, np.ndarray):
dtypes |= {o.dtype.name}

Expand Down
2 changes: 1 addition & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -2326,7 +2326,7 @@ def _sizeof_fmt(num, size_qualifier):
else:
_verbose_repr()

counts = self.get_dtype_counts()
counts = self._data.get_dtype_counts()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay. It's internal usage and slightly more performant I would think than dtype.value_counts() (left as a dictionary as opposed to constructing the Series)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove get_dtype_counts() from blocks its unecessary as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to be needed to get the dtypes later on for info?

dtypes = ['{k}({kk:d})'.format(k=k[0], kk=k[1]) for k
in sorted(counts.items())]
lines.append('dtypes: {types}'.format(types=', '.join(dtypes)))
Expand Down
8 changes: 8 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -5263,6 +5263,10 @@ def get_dtype_counts(self):
"""
Return counts of unique dtypes in this object.

.. deprecated:: 0.25.0

Use `.dtypes.value_counts()` instead.

Returns
-------
dtype : Series
Expand All @@ -5288,6 +5292,10 @@ def get_dtype_counts(self):
object 1
dtype: int64
"""
warnings.warn("`get_dtype_counts` has been deprecated and will be "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the docstring and add deprecated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we recommend .dtypes.value_counts() here instead? Or... we're in generic.py so that may be too hard?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah unfortunately that solution does not work for Series, but I could add for DataFrames use .dtypes.value_counts()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, just need something as a replacement (may also want to add in the doc-string itself)

"removed in a future version. For DataFrames use "
"`.dtypes.value_counts()", FutureWarning,
stacklevel=2)
from pandas import Series
return Series(self._data.get_dtype_counts())

Expand Down
10 changes: 5 additions & 5 deletions pandas/tests/frame/test_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@

import pandas as pd
from pandas import (
Categorical, DataFrame, Series, SparseDataFrame, compat, date_range,
timedelta_range)
Categorical, DataFrame, Series, SparseDataFrame, SparseDtype, compat,
date_range, timedelta_range)
import pandas.util.testing as tm
from pandas.util.testing import (
assert_almost_equal, assert_frame_equal, assert_series_equal)
Expand Down Expand Up @@ -433,11 +433,11 @@ def test_with_datetimelikes(self):
'B': timedelta_range('1 day', periods=10)})
t = df.T

result = t.get_dtype_counts()
result = t.dtypes.value_counts()
if self.klass is DataFrame:
expected = Series({'object': 10})
expected = Series({np.dtype('object'): 10})
else:
expected = Series({'Sparse[object, nan]': 10})
expected = Series({SparseDtype(dtype=object): 10})
tm.assert_series_equal(result, expected)


Expand Down
8 changes: 4 additions & 4 deletions pandas/tests/frame/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,8 +273,8 @@ def test_df_flex_cmp_constant_return_types(self, opname):
df = pd.DataFrame({'x': [1, 2, 3], 'y': [1., 2., 3.]})
const = 2

result = getattr(df, opname)(const).get_dtype_counts()
tm.assert_series_equal(result, pd.Series([2], ['bool']))
result = getattr(df, opname)(const).dtypes.value_counts()
tm.assert_series_equal(result, pd.Series([2], index=[np.dtype(bool)]))

@pytest.mark.parametrize('opname', ['eq', 'ne', 'gt', 'lt', 'ge', 'le'])
def test_df_flex_cmp_constant_return_types_empty(self, opname):
Expand All @@ -283,8 +283,8 @@ def test_df_flex_cmp_constant_return_types_empty(self, opname):
const = 2

empty = df.iloc[:0]
result = getattr(empty, opname)(const).get_dtype_counts()
tm.assert_series_equal(result, pd.Series([2], ['bool']))
result = getattr(empty, opname)(const).dtypes.value_counts()
tm.assert_series_equal(result, pd.Series([2], index=[np.dtype(bool)]))


# -------------------------------------------------------------------
Expand Down
25 changes: 14 additions & 11 deletions pandas/tests/frame/test_block_internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -217,19 +217,21 @@ def test_construction_with_mixed(self, float_string_frame):
df = DataFrame(data)

# check dtypes
result = df.get_dtype_counts().sort_values()
result = df.dtypes
expected = Series({'datetime64[ns]': 3})

# mixed-type frames
float_string_frame['datetime'] = datetime.now()
float_string_frame['timedelta'] = timedelta(days=1, seconds=1)
assert float_string_frame['datetime'].dtype == 'M8[ns]'
assert float_string_frame['timedelta'].dtype == 'm8[ns]'
result = float_string_frame.get_dtype_counts().sort_values()
expected = Series({'float64': 4,
'object': 1,
'datetime64[ns]': 1,
'timedelta64[ns]': 1}).sort_values()
result = float_string_frame.dtypes
expected = Series([np.dtype('float64')] * 4 +
[np.dtype('object'),
np.dtype('datetime64[ns]'),
np.dtype('timedelta64[ns]')],
index=list('ABCD') + ['foo', 'datetime',
'timedelta'])
assert_series_equal(result, expected)

def test_construction_with_conversions(self):
Expand Down Expand Up @@ -409,11 +411,12 @@ def test_get_numeric_data(self):
df = DataFrame({'a': 1., 'b': 2, 'c': 'foo',
'f': Timestamp('20010102')},
index=np.arange(10))
result = df.get_dtype_counts()
expected = Series({'int64': 1, 'float64': 1,
datetime64name: 1, objectname: 1})
result = result.sort_index()
expected = expected.sort_index()
result = df.dtypes
expected = Series([np.dtype('float64'),
np.dtype('int64'),
np.dtype(objectname),
np.dtype(datetime64name)],
index=['a', 'b', 'c', 'f'])
assert_series_equal(result, expected)

df = DataFrame({'a': 1., 'b': 2, 'c': 'foo',
Expand Down
6 changes: 4 additions & 2 deletions pandas/tests/frame/test_combine_concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,10 @@ def test_concat_multiple_frames_dtypes(self):
A = DataFrame(data=np.ones((10, 2)), columns=[
'foo', 'bar'], dtype=np.float64)
B = DataFrame(data=np.ones((10, 2)), dtype=np.float32)
results = pd.concat((A, B), axis=1).get_dtype_counts()
expected = Series(dict(float64=2, float32=2))
results = pd.concat((A, B), axis=1).dtypes
expected = Series([np.dtype('float64')] * 2 +
[np.dtype('float32')] * 2,
index=['foo', 'bar', 0, 1])
assert_series_equal(results, expected)

@pytest.mark.parametrize('data', [
Expand Down
Loading