-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issues with overlapping multi index intervals #27456
Comments
pls edit the top to include a fully reproducible example & version info |
I added the version info in the stackoverflow question. Please let me know if you need additional info. Thanks. |
@mahdirajabi96 we've got over 3000 issues to triage; pointing to stackoverflow practically ensures this stays at the bottom of the list. pls follow @jreback's request |
I am trying to query an interval index with a fixed number. If all the intervals are defined as integers, it works, but if they are float, it won't.
Versions:
|
this code would work if all the intervals in df were integers. |
@mahdirajabi96 pls try your example on 0.25 which has changed substantially the handling of overlapping intervals. |
The intervals shouldn't overlap for specific Item and RID and I don't think they are, following is the df output:
Route ID 1 (RID1) expands from milepost 0 to 2 and Route ID2 expands from milepost 10 to 12. Each route has two attributes called FC and OWNER. |
I tried 0.25 and got a different error:
Again, it works fine if I define the df as the following:
|
One more comment: if I run my query as the following it works regardless of using integer or float intervals:
The only problem is that it is 6 times slower and because I analyze roadway data for DOTs I work with millions of rows and need to perform these queries very often it becomes really an issue. |
@mahdirajabi96 can you update the original post to include the minimal example? http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports In particular, can you include a sample demonstrating your code working when the levels are float-dtype intervals rather than integer? |
I updated the original post to include a minimal example. as indicated earlier, apparently the problem is not the float or integer type, it is overlapping. |
Any updates on this ticket? anything else needed from me? Just trying to make sure that it will be addressed. Thank you all. |
@mahdirajabi96 if you can investigate would help this along |
Scenario 1: single-level indexing, which works fine:
which returns:
query results:
Scenario 2: Multi-level indexing:
which returns:
query method 1: works fine on both df1 and df2 but is slow
query method 2: works only with df2, doesn't work with df1, is 10 times faster than query method 1
KeyError Traceback (most recent call last)
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2889 try:
-> 2890 return self._engine.get_loc(key)
2891 except KeyError:
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 1.5
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
in ()
11 display(df)
12 print(df.loc['label1'].loc[1.5])
---> 13 print(df.loc[('label1',1.5)])
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\indexing.py in getitem(self, key)
1402 except (KeyError, IndexError, AttributeError):
1403 pass
-> 1404 return self._getitem_tuple(key)
1405 else:
1406 # we by definition only have the 0th axis
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
789 def _getitem_tuple(self, tup):
790 try:
--> 791 return self._getitem_lowerdim(tup)
792 except IndexingError:
793 pass
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\indexing.py in _getitem_lowerdim(self, tup)
945 return section
946 # This is an elided recursive call to iloc/loc/etc'
--> 947 return getattr(section, self.name)[new_key]
948
949 raise IndexingError("not applicable")
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\indexing.py in getitem(self, key)
1402 except (KeyError, IndexError, AttributeError):
1403 pass
-> 1404 return self._getitem_tuple(key)
1405 else:
1406 # we by definition only have the 0th axis
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
789 def _getitem_tuple(self, tup):
790 try:
--> 791 return self._getitem_lowerdim(tup)
792 except IndexingError:
793 pass
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\indexing.py in _getitem_lowerdim(self, tup)
913 for i, key in enumerate(tup):
914 if is_label_like(key) or isinstance(key, tuple):
--> 915 section = self._getitem_axis(key, axis=i)
916
917 # we have yielded a scalar ?
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
1823 # fall thru to straight lookup
1824 self._validate_key(key, axis)
-> 1825 return self._get_label(key, axis=axis)
1826
1827
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\indexing.py in _get_label(self, label, axis)
155 raise IndexingError("no slices here, handle elsewhere")
156
--> 157 return self.obj._xs(label, axis=axis)
158
159 def _get_loc(self, key: int, axis: int):
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\generic.py in xs(self, key, axis, level, drop_level)
3728
3729 if axis == 1:
-> 3730 return self[key]
3731
3732 self._consolidate_inplace()
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\frame.py in getitem(self, key)
2973 if self.columns.nlevels > 1:
2974 return self._getitem_multilevel(key)
-> 2975 indexer = self.columns.get_loc(key)
2976 if is_integer(indexer):
2977 indexer = [indexer]
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2890 return self._engine.get_loc(key)
2891 except KeyError:
-> 2892 return self._engine.get_loc(self._maybe_cast_indexer(key))
2893 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2894 if indexer.ndim > 1 or indexer.size > 1:
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 1.5
The text was updated successfully, but these errors were encountered: