-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: Make pd.Index.values not writable #33001
Comments
As is want to happen, this looks more complicated than I expected. After changing the
|
Do the tracebacks show what cython functions are raising? usually that means we need to change a type annotation in that function |
Yes, it's a bunch of functions. I could try and collect those if that would be useful. Here's one example: pandas/tests/indexes/datetimes/test_join.py:42:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/core/indexes/base.py:3441: in join
self, how=how, level=level, return_indexers=return_indexers
pandas/core/indexes/datetimelike.py:856: in join
sort=sort,
pandas/core/indexes/base.py:3451: in join
return this.join(other, how=how, return_indexers=return_indexers)
pandas/core/indexes/base.py:3471: in join
other, how=how, return_indexers=return_indexers
pandas/core/indexes/base.py:3778: in _join_monotonic
lidx = self._left_indexer_unique(ov, sv)
pandas/core/indexes/base.py:239: in _left_indexer_unique
return libjoin.left_join_indexer_unique(left, right)
pandas/_libs/join.pyx:270: in pandas._libs.join.left_join_indexer_unique
def left_join_indexer_unique(join_t[:] left, join_t[:] right):
stringsource:658: in View.MemoryView.memoryview_cwrapper
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E ValueError: buffer source array is read-only
stringsource:349: ValueError So I thought this could be fixable by changing the signature from
to
But it turns out doing that raises this fun exception during compilation:
Full compilation tracebackCompiling pandas/_libs/join.pyx because it changed.
[1/1] Cythonizing pandas/_libs/join.pyx
Error compiling Cython file:
------------------------------------------------------------
...
# right might contain non-unique values
@cython.wraparound(False)
@cython.boundscheck(False)
def left_join_indexer_unique(const join_t[:] left, const join_t[:] right):
^
------------------------------------------------------------
pandas/_libs/join.pyx:280:42: Compiler crash in AnalyseDeclarationsTransform
File 'ModuleNode.py', line 124, in analyse_declarations: ModuleNode(join.pyx:1:0,
full_module_name = 'pandas._libs.join')
File 'Nodes.py', line 431, in analyse_declarations: StatListNode(join.pyx:1:0)
File 'Nodes.py', line 431, in analyse_declarations: StatListNode(join.pyx:280:0)
File 'Nodes.py', line 375, in analyse_declarations: CompilerDirectivesNode(join.pyx:280:0)
File 'Nodes.py', line 431, in analyse_declarations: StatListNode(join.pyx:280:0)
File 'Nodes.py', line 2887, in analyse_declarations: DefNode(join.pyx:280:0,
modifiers = [...]/0,
name = 'left_join_indexer_unique',
num_required_args = 2,
outer_attrs = [...]/2,
py_wrapper_required = True,
reqd_kw_flags_cname = '0')
File 'Nodes.py', line 2925, in analyse_argument_types: DefNode(join.pyx:280:0,
modifiers = [...]/0,
name = 'left_join_indexer_unique',
num_required_args = 2,
outer_attrs = [...]/2,
py_wrapper_required = True,
reqd_kw_flags_cname = '0')
File 'Nodes.py', line 1093, in analyse: MemoryViewSliceTypeNode(join.pyx:280:42,
name = 'memoryview')
Compiler crash traceback from this point on:
File "/Users/isaac/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/Cython/Compiler/Nodes.py", line 1093, in analyse
self.type = PyrexTypes.MemoryViewSliceType(base_type, axes_specs)
File "/Users/isaac/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/Cython/Compiler/PyrexTypes.py", line 625, in __init__
self.dtype_name = Buffer.mangle_dtype_name(self.dtype)
File "/Users/isaac/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/Cython/Compiler/Buffer.py", line 631, in mangle_dtype_name
return prefix + dtype.specialization_name()
File "/Users/isaac/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/Cython/Compiler/PyrexTypes.py", line 57, in specialization_name
common_subs = (self.empty_declaration_code()
File "/Users/isaac/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/Cython/Compiler/PyrexTypes.py", line 51, in empty_declaration_code
self._empty_declaration = self.declaration_code('')
File "/Users/isaac/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/Cython/Compiler/PyrexTypes.py", line 1582, in declaration_code
return self.const_base_type.declaration_code("const %s" % entity_code, for_display, dll_linkage, pyrex)
File "/Users/isaac/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/Cython/Compiler/PyrexTypes.py", line 1649, in declaration_code
raise Exception("This may never happen, please report a bug")
Exception: This may never happen, please report a bug
Traceback (most recent call last):
File "setup.py", line 791, in <module>
setup_package()
File "setup.py", line 761, in setup_package
ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
File "setup.py", line 534, in maybe_cythonize
return cythonize(extensions, *args, **kwargs)
File "/Users/isaac/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/Cython/Build/Dependencies.py", line 1102, in cythonize
cythonize_one(*args)
File "/Users/isaac/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/Cython/Build/Dependencies.py", line 1225, in cythonize_one
raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: pandas/_libs/join.pyx
(pandas-dev) isaac@Mimir:~/github/pandas ‹master›
$ python3 setup.py build_ext --inplace -j 12
Compiling pandas/_libs/join.pyx because it changed.
[1/1] Cythonizing pandas/_libs/join.pyx
Error compiling Cython file:
------------------------------------------------------------
...
# right might contain non-unique values
@cython.wraparound(False)
@cython.boundscheck(False)
def left_join_indexer_unique(const join_t[:] left, const join_t[:] right):
^
------------------------------------------------------------
pandas/_libs/join.pyx:270:42: Compiler crash in AnalyseDeclarationsTransform
File 'ModuleNode.py', line 124, in analyse_declarations: ModuleNode(join.pyx:1:0,
full_module_name = 'pandas._libs.join')
File 'Nodes.py', line 431, in analyse_declarations: StatListNode(join.pyx:1:0)
File 'Nodes.py', line 431, in analyse_declarations: StatListNode(join.pyx:270:0)
File 'Nodes.py', line 375, in analyse_declarations: CompilerDirectivesNode(join.pyx:270:0)
File 'Nodes.py', line 431, in analyse_declarations: StatListNode(join.pyx:270:0)
File 'Nodes.py', line 2887, in analyse_declarations: DefNode(join.pyx:270:0,
modifiers = [...]/0,
name = 'left_join_indexer_unique',
num_required_args = 2,
outer_attrs = [...]/2,
py_wrapper_required = True,
reqd_kw_flags_cname = '0')
File 'Nodes.py', line 2925, in analyse_argument_types: DefNode(join.pyx:270:0,
modifiers = [...]/0,
name = 'left_join_indexer_unique',
num_required_args = 2,
outer_attrs = [...]/2,
py_wrapper_required = True,
reqd_kw_flags_cname = '0')
File 'Nodes.py', line 1093, in analyse: MemoryViewSliceTypeNode(join.pyx:270:42,
name = 'memoryview')
Compiler crash traceback from this point on:
File "/Users/isaac/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/Cython/Compiler/Nodes.py", line 1093, in analyse
self.type = PyrexTypes.MemoryViewSliceType(base_type, axes_specs)
File "/Users/isaac/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/Cython/Compiler/PyrexTypes.py", line 625, in __init__
self.dtype_name = Buffer.mangle_dtype_name(self.dtype)
File "/Users/isaac/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/Cython/Compiler/Buffer.py", line 631, in mangle_dtype_name
return prefix + dtype.specialization_name()
File "/Users/isaac/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/Cython/Compiler/PyrexTypes.py", line 57, in specialization_name
common_subs = (self.empty_declaration_code()
File "/Users/isaac/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/Cython/Compiler/PyrexTypes.py", line 51, in empty_declaration_code
self._empty_declaration = self.declaration_code('')
File "/Users/isaac/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/Cython/Compiler/PyrexTypes.py", line 1582, in declaration_code
return self.const_base_type.declaration_code("const %s" % entity_code, for_display, dll_linkage, pyrex)
File "/Users/isaac/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/Cython/Compiler/PyrexTypes.py", line 1649, in declaration_code
raise Exception("This may never happen, please report a bug")
Exception: This may never happen, please report a bug
Traceback (most recent call last):
File "setup.py", line 791, in <module>
setup_package()
File "setup.py", line 761, in setup_package
ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
File "setup.py", line 534, in maybe_cythonize
return cythonize(extensions, *args, **kwargs)
File "/Users/isaac/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/Cython/Build/Dependencies.py", line 1102, in cythonize
cythonize_one(*args)
File "/Users/isaac/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/Cython/Build/Dependencies.py", line 1225, in cythonize_one
raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: pandas/_libs/join.pyx |
There was a long-standing issue in Cython where it didn't used to support read-only memory views, but that was fixed a few years ago now: I don't know what minimum Cython version is required these days to build pandas from source, but it seems like it should probably be OK to use read-only views in pandas now? @ivirshup Were you using at least Cython version 0.28 when you encountered the error? |
@shoyer yes, this was using cython v0.29.16 (I'm using the Those are the docs that made me think I could just chuck a It looks like @jbrockmendel may have run into this issue before cython/cython#3222. |
For fused types like |
I see that this is fixed in: cython/cython#3118. It looks like this was not considered a bug fix. Would pandas put a hard restriction on cython |
We distribute wheels, so don't have cython as an install requirement. we are pretty aggressive in bumping it as a dev requirement. I'm also looking forward to conditional nogil in 0.30 |
Regardless of the Cython issues, is this something we actually want to do? What are the implications of this? When we create a Series from an Index (say via a Keep in mind that we'd like to support indexes backed by non-ndarrays some day, which won't necessarily have the same API / capability for marking data as non-writable. |
If you make a series from an index, you should be making a copy of the underlying data, so yes it would be writable. I think the current sharing of references, with the false assumption of immutability, can lead to some very bad behavior. I think if One solution for array classes which do not support immutability would be to return an instance of an immutable view wrapper class. Another would be to return copies of the data from Update: I've opened a PR for this, mainly so it's easy to share what the barriers are. |
Strong 👍 from me. "False immutability" has resulted in a number of subtle downstream bugs in xarray, for example. I think it would be better to make an extra array copy when converting from an |
@TomAugspurger, did you have more thoughts on the feasibility/ desirability of immutability enforcement? I wasn't sure if I should be putting more effort into this yet. |
I think it's a good goal, but I vaguely recall issues with passing readonly
arrays to Cython. Perhaps most of those have been fixed.
To test, I'd recommend temporarily changing the Index constructors to all
mark their arrays as immutable and run the full test suite to see what
breaks.
…On Thu, Apr 16, 2020 at 3:48 AM Isaac Virshup ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger>, did you have more
thoughts on the feasibility/ desirability of immutability enforcement? I
wasn't sure if I should be putting more effort into this yet.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#33001 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOIVPIYOKB2OLD5XVSXTRM3A5BANCNFSM4LTH5AVQ>
.
|
Code Sample, a copy-pastable example if possible
Problem description
Index
objects are assumed to be immutable, but wrap a mutable object. In my experience, it's pretty common to just grab the.values
from pandas object. This has caused surprising behaviour in my own code, and has been discussed here before #19862.I would like to make sure that the values numpy array for index objects has it's
WRITABLE
flag set to false, to makeIndex
object more immutable.Does this sound reasonable? If so, I can give a PR a shot.
Expected Output
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 19.3.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.3
numpy : 1.18.2
pytz : 2018.9
dateutil : 2.7.5
pip : 20.0.2
setuptools : 46.0.0
Cython : 0.29.14
pytest : 5.3.5
hypothesis : 5.6.0
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : None
fastparquet : 0.3.3
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.2.0
numexpr : 2.7.1
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : 3.6.1
tabulate : 0.8.3
xarray : 0.15.0
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : 0.48.0
The text was updated successfully, but these errors were encountered: