Skip to content

DOC: Updated the DataFrame.assign docstring #21917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 88 commits into from
Sep 22, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
58942fc
Working on the assign docstring
datapythonista Jul 22, 2018
de61b38
DOC: cont'd simplified examples in DataFrame.assign docstring
aeltanawy Jul 22, 2018
ef49f88
DOC: adjusted docstring examples in DataFrame.assign to illustrate py…
aeltanawy Sep 4, 2018
1fa9bc5
DOC: Adjusted DataFrame.assign docstring
aeltanawy Sep 9, 2018
4cb55a4
DOC: adjusted the grammer in DataFrame.assign docstring.
aeltanawy Sep 11, 2018
7c7bb7a
Fixed loffset with numpy timedelta (#22482)
discort Sep 4, 2018
d96a334
CLN: Rename 'n' to 'repeats' in .repeat methods (#22574)
gfyoung Sep 4, 2018
607d646
DOC: Updating DataFrame.merge docstring (#22141)
elmq0022 Sep 4, 2018
1b11063
TST: Add capture_stderr decorator to test_validate_docstrings (#22543)
WillAyd Sep 4, 2018
3141dfe
BLD: Fix openpyxl to 2.5.5 (#22601)
gfyoung Sep 5, 2018
66d376d
Use dispatch_to_series where possible (#22572)
jbrockmendel Sep 5, 2018
2168e4a
BUG: resample with TimedeltaIndex, fenceposts are off (#22488)
discort Sep 5, 2018
6693d9a
DOC: Update link and description of the Spyder IDE in Ecosystem docs …
CAM-Gerlach Sep 5, 2018
4ed3760
DOC: Improve the docstring of DataFrame.equals() (#22539)
seantchan Sep 5, 2018
25030e2
TST: fixturize series/test_alter_axes.py (#22526)
h-vetinari Sep 5, 2018
bdca5e9
TST: restructure internal extension arrays tests (split between /arra…
jorisvandenbossche Sep 6, 2018
6c7c975
TST: Fix skipping test due to lack of connectivity (#22598)
rhysparry Sep 6, 2018
2d21d9b
API: Add CalendarDay ('CD') offset (#22288)
mroeschke Sep 7, 2018
9b92446
CLN/DEPR: removed deprecated as_indexer arg from str.match() (#22626)
HyunTruth Sep 7, 2018
ec1f7eb
BUG: NaN should have pct rank of NaN (#22600)
gfyoung Sep 8, 2018
1bfe0c4
Set hypothesis healthcheck (#22597)
alimcmaster1 Sep 8, 2018
0ac130d
Implement delegate_names to allow decorating delegated attributes (#2…
jbrockmendel Sep 8, 2018
1faac78
[PERF] use numexpr in dispatch_to_series (#22284)
jbrockmendel Sep 8, 2018
24501d9
Fix incorrect DTI/TDI indexing; warn before dropping tzinfo (#22549)
jbrockmendel Sep 8, 2018
52b1bf5
[CLN] More cython cleanups, with bonus type annotations (#22283)
jbrockmendel Sep 8, 2018
2e21bd0
move rename functionality out of internals (#21924)
jbrockmendel Sep 8, 2018
1a2b524
TST: Continue collecting arithmetic tests (#22559)
jbrockmendel Sep 8, 2018
09a3d6b
BUG: fix failing DataFrame.loc when indexing with an IntervalIndex (#…
sideeye Sep 8, 2018
128cbd9
DOC: Update `month_name` and `day_name` docstrings (#22544)
Peque Sep 8, 2018
f2af1c6
CLN: tests for str.cat (#22575)
h-vetinari Sep 8, 2018
338683e
DOC: Fix to_latex docstring. (#22516)
Moisan Sep 8, 2018
2fda626
TST: add test to io/formats/test_to_html.py to close GH6131 (#22588)
simonjayhawkins Sep 9, 2018
49b560e
DOC/CLN: small whatsnew fixes (#22659)
jschendel Sep 11, 2018
688c8a4
DOC: Add cross references to advanced.rst (#22671)
topper-123 Sep 12, 2018
f3b3694
DOC: Add section on MultiIndex.to_frame() ordering (#22674)
matthewgilbert Sep 12, 2018
6b3e3c2
TST: Avoid DeprecationWarnings (#22646)
jbrockmendel Sep 12, 2018
16725cf
TST: Collect/Use arithmetic test fixtures (#22645)
jbrockmendel Sep 12, 2018
2ec957b
pythonize cython code (#22638)
jbrockmendel Sep 12, 2018
9837dbc
API: register_extension_dtype class decorator (#22666)
TomAugspurger Sep 13, 2018
e371129
TST: Close ZipFile in compression test (#22679)
TomAugspurger Sep 13, 2018
788158d
CLN: Standardize searchsorted signatures (#22670)
gfyoung Sep 13, 2018
243a19e
DEPR: Removed styler shim (#22691)
TomAugspurger Sep 13, 2018
3445e19
TST Use pytest.raises instead of legacy constructs (#22681)
rth Sep 13, 2018
7d6f275
Fix test_sql pytest fixture warnings (#22515)
alimcmaster1 Sep 14, 2018
b151427
API: Add 'name' as argument for index 'to_frame' method (#22580)
henriqueribeiro Sep 14, 2018
dad9b7c
BUG: Incorrect addition of Week(weekday=6) to DatetimeIndex (#22695)
reidy-p Sep 14, 2018
fab723c
ASV: more for str.cat (#22652)
h-vetinari Sep 14, 2018
1761dbc
TST: Test for bug fixed during #22534 discussion (#22694)
jbrockmendel Sep 15, 2018
93628c5
Fix broken link in install.rst (#22716)
ratijas Sep 15, 2018
d950096
BUG: Make sure that sas7bdat parsers memory is initialized to 0 (#216…
troels Sep 15, 2018
831a527
API: Make .shift always copy (Fixes #22397) (#22517)
AaronCritchley Sep 15, 2018
2b81853
TST: Add test of DataFrame.xs() with duplicates (#13719) (#22294)
nmusolino Sep 15, 2018
e5d334f
DEPR: Standardize searchsorted signature (#22672)
gfyoung Sep 15, 2018
2ac80c4
TST/CLN: break up & parametrize tests for df.set_index (#22236)
h-vetinari Sep 15, 2018
a507946
TST: Mock clipboard IO (#22715)
TomAugspurger Sep 16, 2018
9fe3faf
removing superfluous reference to axis in Series.reorder_levels docst…
SandrineP Sep 17, 2018
7afa8a0
CLN/DOC: Refactor timeseries.rst intro and overview (#22728)
mroeschke Sep 17, 2018
006c013
CLN: Remove unused imports in pyx files (#22739)
mroeschke Sep 18, 2018
845b21a
CLN: Removes module pandas.json (#22737)
vitoriahmc Sep 18, 2018
3ec461f
TST/CLN: remove duplicate data file used in tests (unicode_series.csv…
simonjayhawkins Sep 18, 2018
9465a59
BUG: Some sas7bdat files with many columns are not parseable by read_…
troels Sep 18, 2018
bbf119d
DOC: improve doc string for .aggregate and .transform (#22641)
topper-123 Sep 18, 2018
48de0db
BUG: DataFrame.apply not adding a frequency if freq=None (#22150) (#2…
HannahFerch Sep 18, 2018
3c6ad7d
[ENH] pull in warning for dialect change from pandas-gbq. (#22557)
tswast Sep 18, 2018
4310671
DOC: Updating str_repeat docstring (#22571)
JesperDramsch Sep 18, 2018
49f7fc7
use fused types for reshape (#22454)
jbrockmendel Sep 18, 2018
c15d8c0
use fused types for parts of algos_common_helper (#22452)
jbrockmendel Sep 18, 2018
d03ef77
DOC: Updating the docstring of Series.str.extractall (#22565)
lucadonini96 Sep 18, 2018
52a480d
BUG: don't mangle NaN-float-values and pd.NaT (GH 22295) (#22296)
realead Sep 18, 2018
9935305
DOC: Expose ExcelWriter as part of the Generated API (#22359)
newinh Sep 18, 2018
bada277
Test in scripts/validate_docstrings.py that the short summary is alwa…
Moisan Sep 18, 2018
4f000f5
fix raise of TypeError when subtracting timedelta array (#22054)
illegalnumbers Sep 18, 2018
79b8763
Bug: Logical operator of Series with Index (#22092) (#22293)
makbigc Sep 18, 2018
1aaefe5
DOC: Fix Series nsmallest and nlargest docstring/doctests (#22731)
Moisan Sep 18, 2018
9fe0fbc
Fixturize tests/frame/test_api and tests/sparse/frame/test_frame (#22…
h-vetinari Sep 18, 2018
d64c0a8
BUG SeriesGroupBy.mean() overflowed on some integer array (#22653)
troels Sep 18, 2018
0ba7b16
TST: Fail on warning (#22699)
TomAugspurger Sep 18, 2018
73ff71e
BUG: Allow IOErrors when attempting to retrieve default client encodi…
JayOfferdahl Sep 19, 2018
b7d9884
API: Git version (#22745)
alimcmaster1 Sep 19, 2018
22b2e4a
DOC: add more links to the API in advanced.rst (#22746)
topper-123 Sep 19, 2018
27ea656
DOC: Fix DataFrame.to_xarray doctests and allow the CI to run it. (#2…
Moisan Sep 19, 2018
4a2a24c
Set up CI with Azure Pipelines (#22760)
azure-pipelines[bot] Sep 19, 2018
96b7d84
CI: Fix travis CI (#22765)
TomAugspurger Sep 19, 2018
113ff50
CI: Publish test summary (#22770)
TomAugspurger Sep 19, 2018
5474d32
BUG: Check types in Index.__contains__ (#22085) (#22602)
yeojin-dev Sep 19, 2018
6c765d3
Merge remote-tracking branch 'upstream/master' into doc
aeltanawy Sep 20, 2018
61e4dee
Merge remote-tracking branch 'upstream/master' into doc
aeltanawy Sep 20, 2018
ecfaf47
Removing -assign from pandas/ci/doctests.sh
aeltanawy Sep 21, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
API: Add CalendarDay ('CD') offset (#22288)
  • Loading branch information
mroeschke authored and aeltanawy committed Sep 20, 2018
commit 2d21d9ba68a1ab6040fc7ca713c9da80808b655a
26 changes: 24 additions & 2 deletions doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -369,7 +369,7 @@ In practice this becomes very cumbersome because we often need a very long
index with a large number of timestamps. If we need timestamps on a regular
frequency, we can use the :func:`date_range` and :func:`bdate_range` functions
to create a ``DatetimeIndex``. The default frequency for ``date_range`` is a
**calendar day** while the default for ``bdate_range`` is a **business day**:
**day** while the default for ``bdate_range`` is a **business day**:

.. ipython:: python

Expand Down Expand Up @@ -886,6 +886,27 @@ normalized after the function is applied.
hour.apply(pd.Timestamp('2014-01-01 23:00'))


.. _timeseries.dayvscalendarday:

Day vs. CalendarDay
~~~~~~~~~~~~~~~~~~~

:class:`Day` (``'D'``) is a timedelta-like offset that respects absolute time
arithmetic and is an alias for 24 :class:`Hour`. This offset is the default
argument to many pandas time related function like :func:`date_range` and :func:`timedelta_range`.

:class:`CalendarDay` (``'CD'``) is a relativedelta-like offset that respects
calendar time arithmetic. :class:`CalendarDay` is useful preserving calendar day
semantics with date times with have day light savings transitions, i.e. :class:`CalendarDay`
will preserve the hour before the day light savings transition.

.. ipython:: python

ts = pd.Timestamp('2016-10-30 00:00:00', tz='Europe/Helsinki')
ts + pd.offsets.Day(1)
ts + pd.offsets.CalendarDay(1)


Parametric Offsets
~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -1176,7 +1197,8 @@ frequencies. We will refer to these aliases as *offset aliases*.

"B", "business day frequency"
"C", "custom business day frequency"
"D", "calendar day frequency"
"D", "day frequency"
"CD", "calendar day frequency"
"W", "weekly frequency"
"M", "month end frequency"
"SM", "semi-month end frequency (15th and end of month)"
Expand Down
40 changes: 40 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,46 @@ that the dates have been converted to UTC
.. ipython:: python
pd.to_datetime(["2015-11-18 15:30:00+05:30", "2015-11-18 16:30:00+06:30"], utc=True)

.. _whatsnew_0240.api_breaking.calendarday:

CalendarDay Offset
^^^^^^^^^^^^^^^^^^

:class:`Day` and associated frequency alias ``'D'`` were documented to represent
a calendar day; however, arithmetic and operations with :class:`Day` sometimes
respected absolute time instead (i.e. ``Day(n)`` and acted identically to ``Timedelta(days=n)``).

*Previous Behavior*:

.. code-block:: ipython


In [2]: ts = pd.Timestamp('2016-10-30 00:00:00', tz='Europe/Helsinki')

# Respects calendar arithmetic
In [3]: pd.date_range(start=ts, freq='D', periods=3)
Out[3]:
DatetimeIndex(['2016-10-30 00:00:00+03:00', '2016-10-31 00:00:00+02:00',
'2016-11-01 00:00:00+02:00'],
dtype='datetime64[ns, Europe/Helsinki]', freq='D')

# Respects absolute arithmetic
In [4]: ts + pd.tseries.frequencies.to_offset('D')
Out[4]: Timestamp('2016-10-30 23:00:00+0200', tz='Europe/Helsinki')

:class:`CalendarDay` and associated frequency alias ``'CD'`` are now available
and respect calendar day arithmetic while :class:`Day` and frequency alias ``'D'``
will now respect absolute time (:issue:`22274`, :issue:`20596`, :issue:`16980`, :issue:`8774`)
See the :ref:`documentation here <timeseries.dayvscalendarday>` for more information.

Addition with :class:`CalendarDay` across a daylight savings time transition:

.. ipython:: python

ts = pd.Timestamp('2016-10-30 00:00:00', tz='Europe/Helsinki')
ts + pd.offsets.Day(1)
ts + pd.offsets.CalendarDay(1)

.. _whatsnew_0240.api_breaking.period_end_time:

Time values in ``dt.end_time`` and ``to_timestamp(how='end')``
Expand Down
112 changes: 59 additions & 53 deletions pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
from pandas.core import ops

from pandas.tseries.frequencies import to_offset
from pandas.tseries.offsets import Tick, Day, generate_range
from pandas.tseries.offsets import Tick, generate_range

from pandas.core.arrays import datetimelike as dtl

Expand Down Expand Up @@ -239,56 +239,33 @@ def _generate_range(cls, start, end, periods, freq, tz=None,
start, end, _normalized = _maybe_normalize_endpoints(start, end,
normalize)

tz, inferred_tz = _infer_tz_from_endpoints(start, end, tz)

if hasattr(freq, 'delta') and freq != Day():
# sub-Day Tick
if inferred_tz is None and tz is not None:
# naive dates
if start is not None and start.tz is None:
start = start.tz_localize(tz, ambiguous=False)

if end is not None and end.tz is None:
end = end.tz_localize(tz, ambiguous=False)

if start and end:
if start.tz is None and end.tz is not None:
start = start.tz_localize(end.tz, ambiguous=False)

if end.tz is None and start.tz is not None:
end = end.tz_localize(start.tz, ambiguous=False)

tz, _ = _infer_tz_from_endpoints(start, end, tz)

if tz is not None:
# Localize the start and end arguments
start = _maybe_localize_point(
start, getattr(start, 'tz', None), start, freq, tz
)
end = _maybe_localize_point(
end, getattr(end, 'tz', None), end, freq, tz
)
if start and end:
# Make sure start and end have the same tz
start = _maybe_localize_point(
start, start.tz, end.tz, freq, tz
)
end = _maybe_localize_point(
end, end.tz, start.tz, freq, tz
)
if freq is not None:
if cls._use_cached_range(freq, _normalized, start, end):
# Currently always False; never hit
# Should be reimplemented as apart of GH 17914
index = cls._cached_range(start, end, periods=periods,
freq=freq)
else:
index = _generate_regular_range(cls, start, end, periods, freq)

else:

if tz is not None:
# naive dates
if start is not None and start.tz is not None:
start = start.replace(tzinfo=None)

if end is not None and end.tz is not None:
end = end.replace(tzinfo=None)

if start and end:
if start.tz is None and end.tz is not None:
end = end.replace(tzinfo=None)

if end.tz is None and start.tz is not None:
start = start.replace(tzinfo=None)

if freq is not None:
if cls._use_cached_range(freq, _normalized, start, end):
index = cls._cached_range(start, end, periods=periods,
freq=freq)
else:
index = _generate_regular_range(cls, start, end,
periods, freq)

if tz is not None and getattr(index, 'tz', None) is None:
arr = conversion.tz_localize_to_utc(
ensure_int64(index.values),
Expand All @@ -302,12 +279,12 @@ def _generate_range(cls, start, end, periods, freq, tz=None,
start = start.tz_localize(tz).asm8
if end is not None:
end = end.tz_localize(tz).asm8
else:
# Create a linearly spaced date_range in local time
start = start.tz_localize(tz)
end = end.tz_localize(tz)
arr = np.linspace(start.value, end.value, periods)
index = cls._simple_new(arr.astype('M8[ns]'), freq=None, tz=tz)
else:
# Create a linearly spaced date_range in local time
arr = np.linspace(start.value, end.value, periods)
index = cls._simple_new(
arr.astype('M8[ns]', copy=False), freq=None, tz=tz
)

if not left_closed and len(index) and index[0] == start:
index = index[1:]
Expand Down Expand Up @@ -1256,10 +1233,10 @@ def _generate_regular_range(cls, start, end, periods, freq):
data = cls._simple_new(data.view(_NS_DTYPE), None, tz=tz)
else:
tz = None
# start and end should have the same timezone by this point
if isinstance(start, Timestamp):
tz = start.tz

if isinstance(end, Timestamp):
elif isinstance(end, Timestamp):
tz = end.tz

xdr = generate_range(start=start, end=end,
Expand Down Expand Up @@ -1330,3 +1307,32 @@ def _maybe_normalize_endpoints(start, end, normalize):
_normalized = _normalized and end.time() == _midnight

return start, end, _normalized


def _maybe_localize_point(ts, is_none, is_not_none, freq, tz):
"""
Localize a start or end Timestamp to the timezone of the corresponding
start or end Timestamp

Parameters
----------
ts : start or end Timestamp to potentially localize
is_none : argument that should be None
is_not_none : argument that should not be None
freq : Tick, DateOffset, or None
tz : str, timezone object or None

Returns
-------
ts : Timestamp
"""
# Make sure start and end are timezone localized if:
# 1) freq = a Timedelta-like frequency (Tick)
# 2) freq = None i.e. generating a linspaced range
if isinstance(freq, Tick) or freq is None:
localize_args = {'tz': tz, 'ambiguous': False}
else:
localize_args = {'tz': None}
if is_none is None and is_not_none is not None:
ts = ts.tz_localize(**localize_args)
return ts
19 changes: 6 additions & 13 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -385,7 +385,10 @@ def _generate_range(cls, start, end, periods, name=None, freq=None,

@classmethod
def _use_cached_range(cls, freq, _normalized, start, end):
return _use_cached_range(freq, _normalized, start, end)
# Note: This always returns False
return (freq._should_cache() and
not (freq._normalize_cache and not _normalized) and
_naive_in_cache_range(start, end))

def _convert_for_op(self, value):
""" Convert value to be insertable to ndarray """
Expand Down Expand Up @@ -1580,7 +1583,7 @@ def date_range(start=None, end=None, periods=None, freq=None, tz=None,
Right bound for generating dates.
periods : integer, optional
Number of periods to generate.
freq : str or DateOffset, default 'D' (calendar daily)
freq : str or DateOffset, default 'D'
Frequency strings can have multiples, e.g. '5H'. See
:ref:`here <timeseries.offset_aliases>` for a list of
frequency aliases.
Expand Down Expand Up @@ -1861,17 +1864,7 @@ def _naive_in_cache_range(start, end):
else:
if start.tzinfo is not None or end.tzinfo is not None:
return False
return _in_range(start, end, _CACHE_START, _CACHE_END)


def _in_range(start, end, rng_start, rng_end):
return start > rng_start and end < rng_end


def _use_cached_range(freq, _normalized, start, end):
return (freq._should_cache() and
not (freq._normalize_cache and not _normalized) and
_naive_in_cache_range(start, end))
return start > _CACHE_START and end < _CACHE_END


def _time_to_micros(time):
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/indexes/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -1052,7 +1052,7 @@ def interval_range(start=None, end=None, periods=None, freq=None,
freq : numeric, string, or DateOffset, default None
The length of each interval. Must be consistent with the type of start
and end, e.g. 2 for numeric, or '5H' for datetime-like. Default is 1
for numeric and 'D' (calendar daily) for datetime-like.
for numeric and 'D' for datetime-like.
name : string, default None
Name of the resulting IntervalIndex
closed : {'left', 'right', 'both', 'neither'}, default 'right'
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/indexes/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -840,7 +840,7 @@ def period_range(start=None, end=None, periods=None, freq='D', name=None):
Right bound for generating periods
periods : integer, default None
Number of periods to generate
freq : string or DateOffset, default 'D' (calendar daily)
freq : string or DateOffset, default 'D'
Frequency alias
name : string, default None
Name of the resulting PeriodIndex
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/indexes/timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -737,7 +737,7 @@ def timedelta_range(start=None, end=None, periods=None, freq=None,
Right bound for generating timedeltas
periods : integer, default None
Number of periods to generate
freq : string or DateOffset, default 'D' (calendar daily)
freq : string or DateOffset, default 'D'
Frequency strings can have multiples, e.g. '5H'
name : string, default None
Name of the resulting TimedeltaIndex
Expand Down
Loading