Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate properties module #17590

Merged
merged 4 commits into from
Sep 22, 2017
Merged

Separate properties module #17590

merged 4 commits into from
Sep 22, 2017

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@jbrockmendel jbrockmendel mentioned this pull request Sep 19, 2017
4 tasks
@jreback
Copy link
Contributor

jreback commented Sep 19, 2017

why do you think we should do this?

@jbrockmendel
Copy link
Member Author

Because in upcoming tslibs PRs we will want to import/cimport cache_readonly, which will be much cleaner if it is in its own module instead of in lib. (since lib imports from tslib)

@jreback
Copy link
Contributor

jreback commented Sep 19, 2017

ok this looks fine then. ping on green.

@jreback jreback added the Internals Related to non-user accessible pandas implementation label Sep 19, 2017
@jreback jreback added this to the 0.21.0 milestone Sep 19, 2017
@codecov
Copy link

codecov bot commented Sep 19, 2017

Codecov Report

Merging #17590 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17590      +/-   ##
==========================================
- Coverage   91.22%    91.2%   -0.02%     
==========================================
  Files         163      163              
  Lines       49625    49625              
==========================================
- Hits        45270    45261       -9     
- Misses       4355     4364       +9
Flag Coverage Δ
#multiple 88.99% <ø> (ø) ⬆️
#single 40.19% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6630c4e...1c9f29d. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 19, 2017

Codecov Report

Merging #17590 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17590      +/-   ##
==========================================
- Coverage   91.19%   91.18%   -0.02%     
==========================================
  Files         163      163              
  Lines       49627    49627              
==========================================
- Hits        45259    45250       -9     
- Misses       4368     4377       +9
Flag Coverage Δ
#multiple 88.96% <100%> (ø) ⬆️
#single 40.2% <100%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 91.98% <100%> (ø) ⬆️
pandas/util/_decorators.py 66% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b3087ef...3ad2512. Read the comment docs.

@jbrockmendel
Copy link
Member Author

ping.

@jreback
Copy link
Contributor

jreback commented Sep 21, 2017

can u do an import time test before and after to see if adding another module actually matters

@jbrockmendel
Copy link
Member Author

can u do an import time test before and after to see if adding another module actually matters

Let me know if there's a less hacky way of measuring this.

import sys
mods = list(sys.modules)
def import_del():
	import pandas
	smods = sys.modules.keys()
	for name in smods:
		if name not in mods:
			del sys.modules[name]
	return

import timeit
timeit.timeit(import_del, number=100)

master --> 248.97569394111633
branch --> 246.3097460269928
0.20.3 --> 63.0059711933136

That last statistic is bothersome. Would it make a difference that 0.20.3 is installed in site-packages?

@jbrockmendel
Copy link
Member Author

Hmm looking at cProfile output I'm leaning towards disregarding the numbers above. Need to find a better way to measure this.

@jreback
Copy link
Contributor

jreback commented Sep 21, 2017

using a profile that is not by default to avoid any startup files.

(pandas) bash-3.2$ ipython --profile foo
Python 3.6.1 |Continuum Analytics, Inc.| (default, Mar 22 2017, 19:25:17) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.0.0 -- An enhanced Interactive Python. Type '?' for help.

IPython profile: foo

In [1]: %time import numpy
CPU times: user 177 ms, sys: 15.6 ms, total: 193 ms
Wall time: 193 ms

In [2]: %time import matplotlib
CPU times: user 46.6 ms, sys: 8.55 ms, total: 55.1 ms
Wall time: 59.1 ms

In [3]: %time import pandas
CPU times: user 308 ms, sys: 49.4 ms, total: 357 ms
Wall time: 366 ms

@jbrockmendel
Copy link
Member Author

OK, ran %time import pandas as pd four times apiece for master and branch.

master:

CPU times: user 978 ms, sys: 191 ms, total: 1.17 s
Wall time: 1.35 s
CPU times: user 1.09 s, sys: 153 ms, total: 1.24 s
Wall time: 1.36 s
CPU times: user 962 ms, sys: 121 ms, total: 1.08 s
Wall time: 1.15 s
CPU times: user 1.02 s, sys: 131 ms, total: 1.15 s
Wall time: 1.22 s

branch:

CPU times: user 1.02 s, sys: 163 ms, total: 1.18 s
Wall time: 1.36 s
CPU times: user 928 ms, sys: 120 ms, total: 1.05 s
Wall time: 1.12 s
CPU times: user 1.04 s, sys: 128 ms, total: 1.17 s
Wall time: 1.23 s
CPU times: user 952 ms, sys: 121 ms, total: 1.07 s
Wall time: 1.13 s

Branch has an edge by .06s, but that's all noise.

FWIW If import time is a priority, cProfile results for a single run:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.017    0.017    1.213    1.213 pandas/__init__.py:5(<module>)
        1    0.076    0.076    0.742    0.742 pandas/core/api.py:5(<module>)
        1    0.046    0.046    0.589    0.589 pandas/core/groupby.py:1(<module>)
        1    0.041    0.041    0.326    0.326 pandas/core/frame.py:10(<module>)
        1    0.002    0.002    0.205    0.205 pandas/core/index.py:2(<module>)
        1    0.063    0.063    0.203    0.203 pandas/core/indexes/api.py:1(<module>)
        1    0.001    0.001    0.171    0.171 pandas/core/series.py:3(<module>)
   324/47    0.007    0.000    0.152    0.003 {__import__}
        1    0.030    0.030    0.148    0.148 pandas/plotting/__init__.py:3(<module>)
        1    0.090    0.090    0.116    0.116 pandas/io/api.py:3(<module>)
        1    0.002    0.002    0.112    0.112 pandas/plotting/_converter.py:1(<module>)
      784    0.003    0.000    0.108    0.000 [..]/re.py:230(_compile)
        1    0.003    0.003    0.104    0.104 [..]/numpy/__init__.py:106(<module>)
      135    0.001    0.000    0.104    0.001 [..]/sre_compile.py:567(compile)
      135    0.000    0.000    0.102    0.001 [..]/re.py:192(compile)
        1    0.069    0.069    0.093    0.093 pandas/core/generic.py:2(<module>)
        1    0.000    0.000    0.084    0.084 [..]/numpy/add_newdocs.py:10(<module>)
        1    0.002    0.002    0.083    0.083 [..]/matplotlib/__init__.py:101(<module>)
        1    0.003    0.003    0.081    0.081 [..]/numpy/lib/__init__.py:1(<module>)
        1    0.000    0.000    0.071    0.071 [..]/numpy/lib/type_check.py:3(<module>)
        1    0.023    0.023    0.071    0.071 pandas/core/indexes/base.py:1(<module>)
        1    0.027    0.027    0.070    0.070 [..]/numpy/core/__init__.py:1(<module>)
        1    0.017    0.017    0.066    0.066 pandas/core/indexes/interval.py:1(<module>)
        1    0.000    0.000    0.061    0.061 pandas/util/_tester.py:3(<module>)
        1    0.002    0.002    0.060    0.060 [..]/pytest.py:4(<module>)
       15    0.000    0.000    0.059    0.004 [..]/subprocess.py:118(_eintr_retry_call)
      135    0.000    0.000    0.058    0.000 [..]/sre_compile.py:552(_code)
        1    0.006    0.006    0.053    0.053 [..]/matplotlib/rcsetup.py:15(<module>)
        1    0.003    0.003    0.052    0.052 pandas/compat/__init__.py:25(<module>)

groupby imports frame and series, so we can't attribute anything to that (though it does make a bunch of exec calls)

~5% could be trimmed by making pytest a lazy import in util._tester.

Lazy import of matplotlib would be a win. Not sure how hard that would be. I think I recall @TomAugspurger mentioning this at some point.

A bunch of subprocess calls made by io.clipboards at import could probably be lazified.

No idea where are the regexps are getting compiled.

@jreback
Copy link
Contributor

jreback commented Sep 21, 2017

yes mpl is the big one
clipboard could also be lazily imported
though have to be careful not to break stuff

anyway that's for another issue

@@ -67,6 +67,7 @@ import tslib
from tslib import NaT, Timestamp, Timedelta
import interval
from interval import Interval
from properties import AxisProperty, cache_readonly # noqa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may be able to remove these imports entity and just change where they r called from

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. It looks like the only direct import of lib.cache_readonly is by util._decorators, and other imports get it from there. So that would only require changing it in one place. And lib.AxisProperty is only used in core.generic, so also an easy change. Pls confirm before I make this new change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep jose would be good

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 22, 2017 via email

@jreback
Copy link
Contributor

jreback commented Sep 22, 2017

@TomAugspurger do you have an issue for reducing import times? let's move discussion there

@jreback jreback merged commit 9732af2 into pandas-dev:master Sep 22, 2017
@jreback
Copy link
Contributor

jreback commented Sep 22, 2017

thanks!

@jbrockmendel jbrockmendel deleted the properties branch October 30, 2017 16:23
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants