Skip to content

BUG: fillna maximum recursion depth exceeded in cmp (GH18159). #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 98 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
6c074d1
Numpy bool msgpack bugfix (#18395)
PhyNerd Nov 23, 2017
4e09480
[BUG-FIX] DataFrame created with tzinfo cannot use to_dict(orient="re…
bolkedebruin Nov 23, 2017
e6a0ef8
REF: smarter NaN handling in remove_unused_levels() (#18438)
toobaz Nov 23, 2017
04b628f
cleaned up imports (#18264)
tdpetrou Nov 23, 2017
369df07
CLN: ASV attrs_caching benchmark (#18441)
mroeschke Nov 23, 2017
4e98a7b
BUG: Keep float dtype in merge on int and float column (#18352)
reidy-p Nov 23, 2017
41004d9
BUG: prevent coercion to datetime64[ns] when a Series is initialized …
jamestran201-alt Nov 23, 2017
b45325e
BUG: Copy categorical codes if empty (fixes #18051) (#18436)
topper-123 Nov 23, 2017
492040b
CLN: Add teardowns for some benchmarks (#17616) (#18388)
dmanikowski-reef Nov 23, 2017
5e67065
CI: temp skip geopandas downstream tests (GH18456) (#18457)
jorisvandenbossche Nov 23, 2017
e6eac0b
TST: move melt tests to separate file (#18428)
tdpetrou Nov 23, 2017
154c416
Revert "CI: temp skip geopandas downstream tests (GH18456)" (#18466)
jorisvandenbossche Nov 24, 2017
de4b384
BUG: Fix IntervalIndex constructor inconsistencies (#18424)
jschendel Nov 24, 2017
de5faf1
Lint rule to catch incorrect sphinx directives (#18437)
Scorpil Nov 24, 2017
aec3347
CLN/PERF: simplify tslib.get_time_micros (#18389)
jbrockmendel Nov 24, 2017
4fce784
CLN: Replace comprehensions list/set/dict functions with correspondin…
mroeschke Nov 24, 2017
6660638
CLN/DEPR: remove pd.ordered_merge (#18459)
topper-123 Nov 24, 2017
e728f94
Remove unused from datetime.pxd, check for fastpath in ensure_datetim…
jbrockmendel Nov 24, 2017
aaee541
Change UInt64Index._na_value from 0 to np.nan (#18401)
jschendel Nov 24, 2017
412988e
Update clipboard Qt-bindings for flexiblity and Python3 compatibility…
dvincentwest Nov 24, 2017
467ee2b
Allow indices to be mapped through through dictionaries or series (#1…
nateyoder Nov 25, 2017
5cd4cb2
CLN: ASV ctors benchmark (#18479)
mroeschke Nov 25, 2017
200227e
CLN: ASV categoricals benchmark (#18465)
mroeschke Nov 25, 2017
b71ecbd
CLN: ASV binary ops benchmark (#18444)
mroeschke Nov 25, 2017
9c9a09f
BUG: Fix IntervalIndex.insert to allow inserting NaN (#18300)
jschendel Nov 25, 2017
be66ef8
Cross off a few tslibs-TODOs (#18443)
jbrockmendel Nov 25, 2017
0bcd77e
BUG: in Python3 MultiIndex.from_tuples cannot take "zipped" tuples (#…
Xbar Nov 25, 2017
06518b2
Prevent passing invalid kwds to DateOffset constructors (#18226)
jbrockmendel Nov 25, 2017
b69c1a2
BUG: Fix Index.putmask makes stack overflow with an invalid mask (#18…
Licht-T Nov 25, 2017
3d44221
BUG: Fix inaccurate rolling.var calculation (#18481)
Licht-T Nov 25, 2017
1fab808
CLN: ASV Algorithms benchmark (#18423)
mroeschke Nov 25, 2017
20f6512
Propogating NaN values when using str.split (#18450) (#18462)
WillAyd Nov 25, 2017
50f432d
BLD: merge-script.py typo
jreback Nov 25, 2017
38f41e6
CI: remove pandas-gbq from 3.5 build to avoid conflicts with 3.6 buil…
jreback Nov 26, 2017
c44a063
CLN: ASV eval benchmark (#18500)
mroeschke Nov 26, 2017
f1aac43
CLN: ASV frame_ctor benchmark (#18499)
mroeschke Nov 26, 2017
f26bed6
DEPR: Deprecate NDFrame.as_matrix (#18458)
topper-123 Nov 26, 2017
b08c22b
TYPO: IntervalIndex.symmetric_differnce -> IntervalIndex.symmetric_di…
jschendel Nov 26, 2017
78b24b2
parametrize offsets tests (#18494)
jbrockmendel Nov 26, 2017
29206ee
fix missing arg in asvs (#18503)
jbrockmendel Nov 26, 2017
f6fe089
EHN: Improve from_items error message (#17312) (#17881)
reidy-p Nov 26, 2017
5f7d86c
Improved description of seaborn (#18495)
mwaskom Nov 26, 2017
68b66ab
COMPAT: map infers all-nan / empty correctly (#18491)
jreback Nov 26, 2017
d101064
Fix tzaware dates mismatch but no exception raised (#18488)
aschade Nov 26, 2017
982ad07
TST: move gbq back to 3.5 build and remove from BUILD_TEST (#18506)
jreback Nov 26, 2017
674fb96
BUG fixes tuple agg issue 18079 (#18354)
bobhaffner Nov 26, 2017
49ddcd5
simplify skiplist inclusion/cimport to be more cythonize-friendly (#1…
jbrockmendel Nov 27, 2017
f745e52
Implement business_start/end cases for shift_months (#18489)
jbrockmendel Nov 27, 2017
1043a46
move monthrange inside get_first/last_bday, allows nogil (#18512)
jbrockmendel Nov 27, 2017
f7c79be
Added repr string for Grouper and TimeGrouper (#18203)
topper-123 Nov 27, 2017
4fd104a
COMPAT: reading json with lines=True from s3, xref #17200 (#17201)
Nov 27, 2017
262e8ff
BUG: Ignore division by 0 when merging empty dataframes (#17776) (#17…
yeemey Nov 27, 2017
34b036c
TST: Skipif decorator for matplotlib #18190 (#18427)
WillAyd Nov 27, 2017
88ab693
implement shift_quarters --> apply_index for quarters and years (#18522)
jbrockmendel Nov 27, 2017
7463f86
BUG: Index constructor support tupleization for mixed levels (#18514)
toobaz Nov 28, 2017
6148e58
BUG: Fix marker for high memory (#18526)
TomAugspurger Nov 28, 2017
94f3923
remove unused (#18533)
jbrockmendel Nov 28, 2017
2a0e54b
improved DataFrame/SeriesGroupBy.apply doc string (#18534)
topper-123 Nov 28, 2017
32f562d
Fastpaths for Timestamp properties (#18539)
jbrockmendel Nov 29, 2017
48c5bfc
CLN: ASV frame_methods benchmark (#18536)
mroeschke Nov 29, 2017
d3c3c2b
remove arg that is only ever used as NPY_UNSAFE_CASTING; remove code …
jbrockmendel Nov 29, 2017
e459658
Make khash its own extension (#18472)
jbrockmendel Nov 29, 2017
7627cca
check for datetime+period addition (#18524)
jbrockmendel Nov 29, 2017
a47ad56
CLN: ASV remove uncessary selfs and add setups (#18575)
mroeschke Nov 30, 2017
c40c8f8
DOC: clarify default window in rolling method (#18177)
dstansby Nov 30, 2017
67c4d0f
DOC: header='infer' is not working when there is no header, closes #1…
cmazzullo Nov 30, 2017
5da3759
BUG: Fix groupby over a CategoricalIndex in axis=1 (#18525)
ekisslinger Nov 30, 2017
5cd5e3b
Update pandas.read_gbq docs to point to pandas-gbq (#18548)
tswast Dec 1, 2017
1eedcf6
API: change datetimelike Index to raise IndexError instead ValueError…
jorisvandenbossche Dec 1, 2017
d74ac70
CLN/DOC: Interval and IntervalIndex classes (#18585)
jschendel Dec 1, 2017
d5ffb1f
Support merging DataFrames on a combo of columns and index levels (GH…
jonmmease Dec 1, 2017
f7df0ff
BUG: do not fail when stack()ing unsortable level (#18363)
toobaz Dec 1, 2017
d270bbb
Construction of Series from dict containing NaN as key (#18496)
toobaz Dec 1, 2017
d163de7
BLD Added --strict and -r sxX to test scripts (#18598)
nicku33 Dec 1, 2017
e1ba19a
API: empty map should not infer (#18517)
jreback Dec 2, 2017
0e16818
BUG: Unwanted conversion from timedelta to float (#18493) (#18586)
fjdiod Dec 2, 2017
7a3f81a
ENH: Better error message if usecols doesn't match columns (#17310)
AaronCritchley Dec 3, 2017
8172565
BUG: GH17464 MultiIndex now raises an error when levels aren't unique…
cmazzullo Dec 3, 2017
dc5403f
CLN: Move period.pyx to tslibs/period.pyx (#18555)
AaronCritchley Dec 3, 2017
a9e4731
BLD: since we already use setuptools, let's remove the optional logic…
gkonefal-reef Dec 3, 2017
bd9a3e0
STYLE: conform setup.py to use .format string formatting
jreback Dec 3, 2017
6e56195
Cleanup cimports (#18556)
jbrockmendel Dec 3, 2017
aa5b6e6
DEPR: deprecate .asobject property (#18572)
topper-123 Dec 4, 2017
5bf9486
CLN: Remove SparseList from pandas API (#18621)
gfyoung Dec 4, 2017
fe34b32
timestamp/timedelta test cleanup (#18619)
jbrockmendel Dec 4, 2017
73ed6de
DOC: Remove keep=False docs on nlargest/nsmallest (#18617)
AaronCritchley Dec 4, 2017
3e4e4b3
DEPS: require updated python-dateutil, openpyxl (#18182)
jreback Dec 4, 2017
02e72ec
remove unused args, metadata structs (#18567)
jbrockmendel Dec 4, 2017
e99cb9c
Update imports, use nogil version of sqrt (#18557)
jbrockmendel Dec 4, 2017
2c903d5
json_normalize: Make code more pythonic and avoid modification of met…
davidfischer-ch Dec 4, 2017
a764663
BLD: Bump Cython version from 0.23 to 0.24 (#18623)
jschendel Dec 4, 2017
52fefd5
CLN: Remove io.data and io.wb (#18612)
gfyoung Dec 5, 2017
52838e6
CLN: ASV groupby benchmarks (#18611)
mroeschke Dec 5, 2017
c3c04e2
CLN: Remove Categorical.from_array (#18642)
gfyoung Dec 5, 2017
b288d19
BUG: fillna maximum recursion depth exceeded in cmp (GH18159).
gkonefal-reef Nov 22, 2017
ac45112
Applied requested changes
gkonefal-reef Dec 3, 2017
68a5f5e
Moved change log to conversions section
gkonefal-reef Dec 3, 2017
03b2f6b
Lint fixes
gkonefal-reef Dec 8, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 84 additions & 82 deletions asv_bench/benchmarks/algorithms.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
from importlib import import_module

import numpy as np

import pandas as pd
from pandas.util import testing as tm

Expand All @@ -12,113 +11,116 @@
except:
pass

class Algorithms(object):
from .pandas_vb_common import setup # noqa


class Factorize(object):

goal_time = 0.2

def setup(self):
N = 100000
np.random.seed(1234)
params = [True, False]
param_names = ['sort']

self.int_unique = pd.Int64Index(np.arange(N * 5))
# cache is_unique
self.int_unique.is_unique
def setup(self, sort):
N = 10**5
self.int_idx = pd.Int64Index(np.arange(N).repeat(5))
self.float_idx = pd.Float64Index(np.random.randn(N).repeat(5))
self.string_idx = tm.makeStringIndex(N)

self.int = pd.Int64Index(np.arange(N).repeat(5))
self.float = pd.Float64Index(np.random.randn(N).repeat(5))
def time_factorize_int(self, sort):
self.int_idx.factorize(sort=sort)

# Convenience naming.
self.checked_add = pd.core.algorithms.checked_add_with_arr
def time_factorize_float(self, sort):
self.float_idx.factorize(sort=sort)

self.arr = np.arange(1000000)
self.arrpos = np.arange(1000000)
self.arrneg = np.arange(-1000000, 0)
self.arrmixed = np.array([1, -1]).repeat(500000)
self.strings = tm.makeStringIndex(100000)
def time_factorize_string(self, sort):
self.string_idx.factorize(sort=sort)

self.arr_nan = np.random.choice([True, False], size=1000000)
self.arrmixed_nan = np.random.choice([True, False], size=1000000)

# match
self.uniques = tm.makeStringIndex(1000).values
self.all = self.uniques.repeat(10)
class Duplicated(object):

def time_factorize_string(self):
self.strings.factorize()
goal_time = 0.2

def time_factorize_int(self):
self.int.factorize()
params = ['first', 'last', False]
param_names = ['keep']

def time_factorize_float(self):
self.int.factorize()
def setup(self, keep):
N = 10**5
self.int_idx = pd.Int64Index(np.arange(N).repeat(5))
self.float_idx = pd.Float64Index(np.random.randn(N).repeat(5))
self.string_idx = tm.makeStringIndex(N)

def time_duplicated_int_unique(self):
self.int_unique.duplicated()
def time_duplicated_int(self, keep):
self.int_idx.duplicated(keep=keep)

def time_duplicated_int(self):
self.int.duplicated()
def time_duplicated_float(self, keep):
self.float_idx.duplicated(keep=keep)

def time_duplicated_float(self):
self.float.duplicated()
def time_duplicated_string(self, keep):
self.string_idx.duplicated(keep=keep)

def time_match_strings(self):
pd.match(self.all, self.uniques)

def time_add_overflow_pos_scalar(self):
self.checked_add(self.arr, 1)
class DuplicatedUniqueIndex(object):

def time_add_overflow_neg_scalar(self):
self.checked_add(self.arr, -1)
goal_time = 0.2

def time_add_overflow_zero_scalar(self):
self.checked_add(self.arr, 0)
def setup(self):
N = 10**5
self.idx_int_dup = pd.Int64Index(np.arange(N * 5))
# cache is_unique
self.idx_int_dup.is_unique

def time_add_overflow_pos_arr(self):
self.checked_add(self.arr, self.arrpos)
def time_duplicated_unique_int(self):
self.idx_int_dup.duplicated()

def time_add_overflow_neg_arr(self):
self.checked_add(self.arr, self.arrneg)

def time_add_overflow_mixed_arr(self):
self.checked_add(self.arr, self.arrmixed)
class Match(object):

def time_add_overflow_first_arg_nan(self):
self.checked_add(self.arr, self.arrmixed, arr_mask=self.arr_nan)
goal_time = 0.2

def time_add_overflow_second_arg_nan(self):
self.checked_add(self.arr, self.arrmixed, b_mask=self.arrmixed_nan)
def setup(self):
self.uniques = tm.makeStringIndex(1000).values
self.all = self.uniques.repeat(10)

def time_add_overflow_both_arg_nan(self):
self.checked_add(self.arr, self.arrmixed, arr_mask=self.arr_nan,
b_mask=self.arrmixed_nan)
def time_match_string(self):
pd.match(self.all, self.uniques)


class Hashing(object):

goal_time = 0.2

def setup(self):
N = 100000

self.df = pd.DataFrame(
{'A': pd.Series(tm.makeStringIndex(100).take(
np.random.randint(0, 100, size=N))),
'B': pd.Series(tm.makeStringIndex(10000).take(
np.random.randint(0, 10000, size=N))),
'D': np.random.randn(N),
'E': np.arange(N),
'F': pd.date_range('20110101', freq='s', periods=N),
'G': pd.timedelta_range('1 day', freq='s', periods=N),
})
self.df['C'] = self.df['B'].astype('category')
self.df.iloc[10:20] = np.nan

def time_frame(self):
hashing.hash_pandas_object(self.df)

def time_series_int(self):
hashing.hash_pandas_object(self.df.E)

def time_series_string(self):
hashing.hash_pandas_object(self.df.B)

def time_series_categorical(self):
hashing.hash_pandas_object(self.df.C)
def setup_cache(self):
N = 10**5

df = pd.DataFrame(
{'strings': pd.Series(tm.makeStringIndex(10000).take(
np.random.randint(0, 10000, size=N))),
'floats': np.random.randn(N),
'ints': np.arange(N),
'dates': pd.date_range('20110101', freq='s', periods=N),
'timedeltas': pd.timedelta_range('1 day', freq='s', periods=N)})
df['categories'] = df['strings'].astype('category')
df.iloc[10:20] = np.nan
return df

def time_frame(self, df):
hashing.hash_pandas_object(df)

def time_series_int(self, df):
hashing.hash_pandas_object(df['ints'])

def time_series_string(self, df):
hashing.hash_pandas_object(df['strings'])

def time_series_float(self, df):
hashing.hash_pandas_object(df['floats'])

def time_series_categorical(self, df):
hashing.hash_pandas_object(df['categories'])

def time_series_timedeltas(self, df):
hashing.hash_pandas_object(df['timedeltas'])

def time_series_dates(self, df):
hashing.hash_pandas_object(df['dates'])
6 changes: 5 additions & 1 deletion asv_bench/benchmarks/attrs_caching.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from .pandas_vb_common import *
import numpy as np
from pandas import DataFrame

try:
from pandas.util import cache_readonly
Expand All @@ -7,9 +8,11 @@


class DataFrameAttributes(object):

goal_time = 0.2

def setup(self):
np.random.seed(1234)
self.df = DataFrame(np.random.randn(10, 6))
self.cur_index = self.df.index

Expand All @@ -21,6 +24,7 @@ def time_set_index(self):


class CacheReadonly(object):

goal_time = 0.2

def setup(self):
Expand Down
Loading