Skip to content

ENH: GH17054: read_html() handles rowspan/colspan and infers headers #17089

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 145 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
145 commits
Select commit Hold shift + click to select a range
4bf2f2e
ENH: GH17054: read_html() handles rowspan/colspan and infers headers
jowens Jul 26, 2017
80d9c2b
in python 3, lambdas no longer take tuples as args. thanks pep 3113.
jowens Jul 27, 2017
26d1f6a
fixing lint error
jowens Jul 27, 2017
37af4ea
in python3, zip does not return a list, so list(zip(...))
jowens Jul 27, 2017
86dee93
Merge branch 'master' into read_html_with_colspan_rowspan
jowens Aug 29, 2017
d3eca72
Merge branch 'master' into read_html_with_colspan_rowspan
jowens Sep 6, 2017
f064562
documentation changes only
jowens Sep 6, 2017
67c8a59
Merge branch 'read_html_with_colspan_rowspan' of github.com:jowens/pa…
jowens Sep 6, 2017
5a38278
documentation changes only
jowens Sep 7, 2017
39f7814
documentation changes only, limited to 80 cols
jowens Sep 7, 2017
531863f
more documentation edits
jowens Sep 8, 2017
818d394
minor documentation edits
jowens Sep 9, 2017
f3a6aa3
better return type explanation in code, added issue number to tests
jowens Sep 9, 2017
2f904b2
cleaning up legacy documentation issues
jowens Sep 18, 2017
f4e7592
remove 'if'
jowens Sep 18, 2017
293d9e4
newlines for clarity
jowens Sep 18, 2017
efabae4
DOC: whatsnew typos
jreback Jul 26, 2017
552677f
ENH: GH17054: read_html() handles rowspan/colspan and infers headers
jowens Jul 26, 2017
1aacf17
TST: Check more error messages in tests (#17075)
gfyoung Jul 26, 2017
359890f
BUG: Respect dtype when calling pivot_table with margins=True
toobaz Jul 26, 2017
3fd2612
MAINT: Add missing space in parsers.pyx
gfyoung Jul 27, 2017
76249bf
MAINT: Add missing paren around print statement
gfyoung Jul 27, 2017
77d16d4
DOC: fix typos in missing.rst
jreback Jul 27, 2017
bd50a4f
in python 3, lambdas no longer take tuples as args. thanks pep 3113.
jowens Jul 27, 2017
452e08d
fixing lint error
jowens Jul 27, 2017
ecfaa4c
in python3, zip does not return a list, so list(zip(...))
jowens Jul 27, 2017
69cd83c
DOC: further clean-up null/na changes (#17113)
jorisvandenbossche Jul 29, 2017
1e5cfa1
BUG: Allow pd.unique to accept tuple of strings (#17108)
mroeschke Jul 30, 2017
c502dba
BUG: Allow Series with same name with crosstab (#16028)
mroeschke Jul 30, 2017
2155c3e
COMPAT: make sure use_inf_as_null is deprecated (#17126)
jreback Aug 1, 2017
3ed9f53
CI: bump version of xlsxwriter to 0.5.2 (#17142)
jreback Aug 1, 2017
9a50c21
DOC: Clean up instructions in ISSUE_TEMPLATE (#17146)
gfyoung Aug 1, 2017
5759eff
Add missing space to the NotImplementedError's message for compound d…
FKint Aug 1, 2017
3855039
DOC: (de)type the return value of concat (#17079) (#17119)
jebob Aug 1, 2017
d7cb627
BUG: Thoroughly dedup column names in read_csv (#17095)
gfyoung Aug 1, 2017
9d32df6
DOC: Additions/updates to documentation (#17150)
alanyee Aug 2, 2017
5ce00e1
ENH: add to/from_parquet with pyarrow & fastparquet (#15838)
jreback Aug 2, 2017
9aadb64
DOC: doc typos, xref #15838
jreback Aug 2, 2017
89fa421
TST: test for categorical index monotonicity (#17152)
jreback Aug 3, 2017
ccdae36
MAINT: Remove non-standard and inconsistently-used imports (#17085)
jbrockmendel Aug 3, 2017
5b42bdf
DOC: typos in whatsnew
Aug 3, 2017
56957cf
DOC: whatsnew 0.21.0 fixes
jreback Aug 3, 2017
d2e21c3
BUG: Fix CSV parsing of singleton list header (#17090)
Aug 3, 2017
20487bf
ENH: Support strings containing '%' in add_prefix/add_suffix (#17151)…
jschendel Aug 3, 2017
b4b4c77
REF: repr - allow block to override values that get formatted (#17143)
jorisvandenbossche Aug 4, 2017
b720f0d
MAINT: Drop unnecessary newlines in issue template
gfyoung Aug 7, 2017
43dab45
remove direct import of nan
jbrockmendel Aug 7, 2017
94a734a
use == to test String equality (#17171)
jhelie Aug 7, 2017
e143ee1
ENH: Add warning when setting into nonexistent attribute (#16951)
deniederhut Aug 7, 2017
5a523bb
DOC: added string processing comparison with SAS (#16497)
natethedrummer Aug 7, 2017
0bfad7c
CLN: remove unused get methods in internals (#17169)
jbrockmendel Aug 7, 2017
a4e4909
TST: Partial Boolean DataFrame Indexing (#17186)
mroeschke Aug 7, 2017
e8fab8a
CLN: Reformat docstring for IPython fixture
gfyoung Aug 7, 2017
d089d44
Define Series.plot and Series.hist in class definition (#17199)
jbrockmendel Aug 8, 2017
b09b274
BUG: support pandas objects in iloc with old numpy versions (#17194)
toobaz Aug 8, 2017
cc8c5d7
Implement _make_accessor classmethod for PandasDelegate (#17166)
jbrockmendel Aug 8, 2017
df9710b
Create ABCDateOffset (#17165)
jbrockmendel Aug 9, 2017
e71e6d7
BUG: resample and apply modify the index type for empty Series (#17149)
discort Aug 9, 2017
e9c7f29
DOC: Updated NDFrame.astype docs (#17203)
topper-123 Aug 9, 2017
38293d3
MAINT: Minor touch-ups to GitHub PULL_REQUEST_TEMPLATE (#17207)
dhimmel Aug 9, 2017
7280e6c
CLN: replace %s syntax with .format in core.computation (#17209)
jschendel Aug 10, 2017
421dcf4
Bugfix for multilevel columns with empty strings in Python 2 (#17099)
chrisjbillington Aug 10, 2017
d5733ee
CLN/ASV clean-up frame stat ops benchmarks (#17205)
jorisvandenbossche Aug 10, 2017
9f69583
BUG: Rolling apply on DataFrame with Datetime index returns NaN (#17156)
FXocena Aug 10, 2017
1e1ce40
CLN: Remove import exception handling (#17218)
dhimmel Aug 10, 2017
a1509dc
MAINT: Remove extra the's in deprecation messages (#17222)
gfyoung Aug 11, 2017
6788533
DOC: Patch docs in _decorators.py
gfyoung Aug 11, 2017
619e031
CLN: replace %s syntax with .format in pandas.util (#17224)
jschendel Aug 11, 2017
9e26997
Add 'See also' sections (#17223)
topper-123 Aug 11, 2017
a7311d2
move pivot_table doc-string to DataFrame (#17174)
jbrockmendel Aug 11, 2017
1ac9ede
Remove import of pandas as pd in core.window (#17233)
jbrockmendel Aug 12, 2017
a2d8d23
TST: Move more frame tests to SharedWithSparse (#17227)
kernc Aug 12, 2017
013b983
REF: _get_objs_combined_axis (#17217)
toobaz Aug 12, 2017
fddb66d
ENH/PERF: Remove frequency inference from .dt accessor (#17210)
cpcloud Aug 14, 2017
2e55156
Fix apparent typo in tests (#17247)
jbrockmendel Aug 14, 2017
b49446e
COMPAT: avoid calling getsizeof() on PyPy
mattip Aug 15, 2017
536b761
CLN: replace %s syntax with .format in pandas.core.reshape (#17252)
jschendel Aug 15, 2017
a1ff671
ENH: Infer compression from non-string paths (#17206)
dhimmel Aug 15, 2017
df1b0dc
Fix bugs in IntervalIndex.is_non_overlapping_monotonic (#17238)
jschendel Aug 15, 2017
8fe1cc3
BUG: Fix behavior of argmax and argmin with inf (#16449) (#16449)
DGrady Aug 15, 2017
357e7ae
CLN: Remove have_pytz (#17266)
jbrockmendel Aug 16, 2017
aa97aa6
CLN: replace %s syntax with .format in core.dtypes and core.sparse (#…
jschendel Aug 17, 2017
a618bec
Replace imports of * with explicit imports (#17269)
jbrockmendel Aug 17, 2017
db3ea2f
TST: pytest deprecation warnings GH17197 (#17253)
swyoon Aug 17, 2017
de60666
Handle more date/datetime/time formats (#15871)
Winand Aug 18, 2017
0bbda54
DOC: add example on json_normalize (#16438)
zzgao Aug 18, 2017
c148dd2
BUG: Have object dtype for empty Categorical.categories (#17249)
TomAugspurger Aug 19, 2017
155c11a
CLN: replace %s syntax with .format in pandas.tseries (#17290)
jschendel Aug 19, 2017
e4aeed2
TST: parameterize consistency tests for rolling/expanding windows (#1…
jreback Aug 19, 2017
db11418
FIX: define `DataFrame.items` for all versions of python (#17214)
tacaswell Aug 19, 2017
a256e26
PERF: Update ASV publish config (#17293)
TomAugspurger Aug 20, 2017
75d46a6
DOC: Expand docstrings for head / tail methods (#16941)
yosukeBaya4 Aug 21, 2017
172abfb
MAINT: Use set literal for unsupported + depr args
gfyoung Aug 21, 2017
1982aca
DOC: Add proper docstring to maybe_convert_indices
gfyoung Aug 21, 2017
393bb19
DOC: Improving docstring of take method (#16948)
matagus Aug 21, 2017
595e0a4
BUG: Fixed regex in asv.conf.json (#17300)
TomAugspurger Aug 21, 2017
6a45d36
Remove unnecessary usage of _TSObject (#17297)
jbrockmendel Aug 21, 2017
5f077f3
BUG: clip should handle null values
mgasvoda Aug 21, 2017
a10fa92
BUG: fillna returns frame when inplace=True if value is a dict (#1615…
Aug 21, 2017
8dfb95b
CLN: Index.append() refactoring (#16236)
toobaz Aug 22, 2017
8326c83
DEPS: set min versions (#17002)
jreback Aug 22, 2017
8fbd8f8
CLN: replace %s syntax with .format in core.tools, algorithms.py, bas…
jschendel Aug 22, 2017
3625190
BUG: Fix strange behaviour of Series.iloc on MultiIndex Series (#1714…
Aug 22, 2017
7364711
DOC: Add module doc-string to tseries/api.py
gfyoung Aug 23, 2017
e5797fa
MAINT: Clean up docs in pandas/errors/__init__.py
gfyoung Aug 23, 2017
9be531a
CLN: replace %s syntax with .format in missing.py, nanops.py, ops.py …
jschendel Aug 24, 2017
a9574b0
Make pd.Period immutable (#17239)
jbrockmendel Aug 24, 2017
3e31383
Bug: groupby multiindex levels equals rows (#16859)
Aug 24, 2017
e5030b3
BUG: Cannot use tz-aware origin in to_datetime (#16842)
ivybae Aug 24, 2017
7be53ed
Replace usage of total_seconds compat func with timedelta method (#17…
jbrockmendel Aug 25, 2017
f4adbb9
CLN: replace %s syntax with .format in core/indexing.py (#17357)
cbertinato Aug 28, 2017
b1b3325
DOC: Point to dev-docs in issue template (#17353)
gfyoung Aug 28, 2017
76cc924
CLN: remove total_seconds compat from json (#17341)
chris-b1 Aug 29, 2017
0309dae
CLN: Move test_intersect_str_dates (#17366)
jschendel Aug 29, 2017
c523bfc
BUG: Respect dups in reindexing CategoricalIndex (#17355)
gfyoung Aug 29, 2017
5a6f2ac
Unify Index._dir_* with Series implementation (#17117)
jbrockmendel Aug 29, 2017
ce8ccba
BUG: make order of index from pd.concat deterministic (#17364)
toobaz Aug 29, 2017
a585e09
Fix typo that causes several NaT methods to have incorrect docstrings…
jbrockmendel Aug 29, 2017
8199559
CLN: replace %s syntax with .format in io/formats/format.py (#17358)
cbertinato Aug 30, 2017
6ec1044
PKG: Added pyproject.toml for PEP 518 (#16745)
TomAugspurger Aug 30, 2017
c33af56
DOC: Update Overview page in documentation (#17368)
iuliakhomenko Aug 30, 2017
0f8205c
API: Have MultiIndex consturctors always return a MI (#17236)
TomAugspurger Aug 30, 2017
54f68b4
CLN: replace %s syntax with .format in io/formats/css.py, excel.py, p…
cbertinato Aug 31, 2017
b717ebc
BUG: not correctly using OrderedDict in test_series_apply (#17384)
sylviawhoa Aug 31, 2017
b61af0e
Remove boxplot from _dataframe_apply_whitelist (#17381)
jbrockmendel Aug 31, 2017
c80e8d0
API: Localize Series when calling to_datetime with utc=True (#6415) (…
mroeschke Sep 1, 2017
3a0dc92
TST: Enable tests in test_tools.py (#17405)
jschendel Sep 1, 2017
365f2fe
TST: remove tests and docs for legacy (pre 0.12) hdf5 support (#17404)
topper-123 Sep 1, 2017
d994323
Tslib unused (#17402)
jbrockmendel Sep 1, 2017
e94e572
DOC: Cleaned references to pandas <v0.12 in docs (#17375)
topper-123 Sep 2, 2017
6a02ffa
Remove unused _day and _month attrs (#17431)
jbrockmendel Sep 4, 2017
519c57f
DOC: Clean-up references to v12 to v14 (both included) (#17420)
topper-123 Sep 5, 2017
f22b895
BUG: Plotting Timedelta on y-axis #16953 (#17430)
s-weigand Sep 6, 2017
8edd85a
COMPAT: handle pyarrow deprecation of timestamps_to_ms in .from_panda…
jreback Sep 6, 2017
047727a
DOC/TST: Add examples to MultiIndex.get_level_values + related change…
topper-123 Sep 6, 2017
91a2300
documentation changes only
jowens Sep 6, 2017
41058ab
documentation changes only
jowens Sep 7, 2017
4926913
documentation changes only, limited to 80 cols
jowens Sep 7, 2017
14235ec
more documentation edits
jowens Sep 8, 2017
196c835
minor documentation edits
jowens Sep 9, 2017
fed4b03
better return type explanation in code, added issue number to tests
jowens Sep 9, 2017
c2d9cc6
cleaning up legacy documentation issues
jowens Sep 18, 2017
d4b213b
remove 'if'
jowens Sep 18, 2017
b16f6d5
newlines for clarity
jowens Sep 18, 2017
092889a
Merge branch 'read_html_with_colspan_rowspan' of github.com:jowens/pa…
jowens Sep 20, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
DOC: further clean-up null/na changes (#17113)
  • Loading branch information
jorisvandenbossche authored and jowens committed Sep 20, 2017
commit 69cd83cec11093c3553abf279bebf8ad2b33fc0a
4 changes: 2 additions & 2 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -511,7 +511,7 @@ optional ``level`` parameter which applies only if the object has a
:header: "Function", "Description"
:widths: 20, 80

``count``, Number of non-na observations
``count``, Number of non-NA observations
``sum``, Sum of values
``mean``, Mean of values
``mad``, Mean absolute deviation
Expand Down Expand Up @@ -541,7 +541,7 @@ will exclude NAs on Series input by default:
np.mean(df['one'].values)

``Series`` also has a method :meth:`~Series.nunique` which will return the
number of unique non-na values:
number of unique non-NA values:

.. ipython:: python

Expand Down
5 changes: 0 additions & 5 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,6 @@ usecols : array-like or callable, default ``None``

Using this parameter results in much faster parsing time and lower memory usage.
as_recarray : boolean, default ``False``

.. deprecated:: 0.18.2

Please call ``pd.read_csv(...).to_records()`` instead.
Expand Down Expand Up @@ -193,7 +192,6 @@ skiprows : list-like or integer, default ``None``
skipfooter : int, default ``0``
Number of lines at bottom of file to skip (unsupported with engine='c').
skip_footer : int, default ``0``

.. deprecated:: 0.19.0

Use the ``skipfooter`` parameter instead, as they are identical
Expand All @@ -208,13 +206,11 @@ low_memory : boolean, default ``True``
use the ``chunksize`` or ``iterator`` parameter to return the data in chunks.
(Only valid with C parser)
buffer_lines : int, default None

.. deprecated:: 0.19.0

Argument removed because its value is not respected by the parser

compact_ints : boolean, default False

.. deprecated:: 0.19.0

Argument moved to ``pd.to_numeric``
Expand All @@ -223,7 +219,6 @@ compact_ints : boolean, default False
parser will attempt to cast it as the smallest integer ``dtype`` possible, either
signed or unsigned depending on the specification from the ``use_unsigned`` parameter.
use_unsigned : boolean, default False

.. deprecated:: 0.18.2

Argument moved to ``pd.to_numeric``
Expand Down
2 changes: 1 addition & 1 deletion doc/source/missing_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ When / why does data become missing?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some might quibble over our usage of *missing*. By "missing" we simply mean
**NA** or "not present for whatever reason". Many data sets simply arrive with
**NA** ("not available") or "not present for whatever reason". Many data sets simply arrive with
missing data, either because it exists and was not collected or it never
existed. For example, in a collection of financial time series, some of the time
series might start on different dates. Thus, values prior to the start date
Expand Down
46 changes: 38 additions & 8 deletions doc/source/whatsnew/v0.10.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -128,15 +128,45 @@ labeled the aggregated group with the end of the interval: the next day).
``notnull``. That they ever were was a relic of early pandas. This behavior
can be re-enabled globally by the ``mode.use_inf_as_null`` option:

.. ipython:: python
.. code-block:: ipython

s = pd.Series([1.5, np.inf, 3.4, -np.inf])
pd.isnull(s)
s.fillna(0)
pd.set_option('use_inf_as_null', True)
pd.isnull(s)
s.fillna(0)
pd.reset_option('use_inf_as_null')
In [6]: s = pd.Series([1.5, np.inf, 3.4, -np.inf])

In [7]: pd.isnull(s)
Out[7]:
0 False
1 False
2 False
3 False
Length: 4, dtype: bool

In [8]: s.fillna(0)
Out[8]:
0 1.500000
1 inf
2 3.400000
3 -inf
Length: 4, dtype: float64

In [9]: pd.set_option('use_inf_as_null', True)

In [10]: pd.isnull(s)
Out[10]:
0 False
1 True
2 False
3 True
Length: 4, dtype: bool

In [11]: s.fillna(0)
Out[11]:
0 1.5
1 0.0
2 3.4
3 0.0
Length: 4, dtype: float64

In [12]: pd.reset_option('use_inf_as_null')

- Methods with the ``inplace`` option now all return ``None`` instead of the
calling object. E.g. code written like ``df = df.fillna(0, inplace=True)``
Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.4.x.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ New Features
- Added Python 3 support using 2to3 (:issue:`200`)
- :ref:`Added <dsintro.name_attribute>` ``name`` attribute to ``Series``, now
prints as part of ``Series.__repr__``
- :ref:`Added <missing.isnull>` instance methods ``isnull`` and ``notnull`` to
- :ref:`Added <missing.isna>` instance methods ``isnull`` and ``notnull`` to
Series (:issue:`209`, :issue:`203`)
- :ref:`Added <basics.align>` ``Series.align`` method for aligning two series
with choice of join method (ENH56_)
Expand Down
4 changes: 2 additions & 2 deletions pandas/core/config_init.py
Original file line number Diff line number Diff line change
Expand Up @@ -398,8 +398,8 @@ def table_schema_cb(key):

use_inf_as_na_doc = """
: boolean
True means treat None, NaN, INF, -INF as na (old way),
False means None and NaN are null, but INF, -INF are not na
True means treat None, NaN, INF, -INF as NA (old way),
False means None and NaN are null, but INF, -INF are not NA
(new way).
"""

Expand Down