[BUG] KeyMap fails with large keys and Pandas 3.0.0

# Pandas 3.0.0 Compatibility Issue in hedtools KeyMap

## Executive Summary

**Issue**: hedtools 0.9.0's `KeyMap` class fails with pandas 3.0.0 when creating a `pd.Series` from a dictionary with certain large integer hash values.

**Location**: `hed/tools/analysis/key_map.py`, line 149:
```python
map_series = pd.Series(self.map_dict)
```

**Root Cause**: Pandas 3.0.0 has a bug in `pd.Series._init_dict()` that causes it to incorrectly handle dictionaries with large negative integer keys that trigger RangeIndex optimization attempts, leading to integer overflow and index length mismatches.

**Workaround**: Create the Index separately before creating the Series.

## Detailed Analysis

### The Failing Code

In `key_map.py` line 149, KeyMap tries to create a Series from its map_dict:

```python
def _remap(self, df):
    # ... earlier code ...
    map_series = pd.Series(self.map_dict)  # <-- FAILS HERE with pandas 3.0
```

### The Problem Dict

The issue occurs with hash dictionaries like:
```python
{-4186896901282141619: 0, -8311529505453501279: 1}
```

These are legitimate hash values generated by KeyMap for string keys like '6' and '2'.

### Pandas 3.0 Bug Details

When `pd.Series(dict)` is called, pandas 3.0's `_init_dict` method:

1. Extracts keys and values from the dict
2. Attempts to optimize by creating a RangeIndex if keys appear sequential
3. With large negative integers, this calculation overflows: `keys[0] + diff` exceeds int64 max
4. The overflow corrupts the index creation, resulting in an empty index (length 0)
5. When trying to create the Series with values (length 2) and empty index (length 0), it raises:
   ```
   ValueError: Length of values (2) does not match length of index (0)
   ```

### Reproduction

**Fails:**
```python
import pandas as pd  # version 3.0.0
d = {-4186896901282141619: 0, -8311529505453501279: 1}
s = pd.Series(d)  # ValueError: Length of values (2) does not match length of index (0)
```

**Works:**
```python
import pandas as pd  # version 3.0.0
d = {-4186896901282141619: 0, -8311529505453501279: 1}
idx = pd.Index(list(d.keys()))
s = pd.Series(list(d.values()), index=idx)  # SUCCESS
```

**Works with pandas 2.x:**
```python
import pandas as pd  # version 2.3.3
d = {-4186896901282141619: 0, -8311529505453501279: 1}
s = pd.Series(d)  # SUCCESS
```

## Why It Doesn't Always Fail

The issue is triggered by:
1. **Specific hash values** that have large magnitudes and specific differences
2. **Dictionary creation timing** - Python's hash function is randomized per session
3. **Pandas optimization heuristics** - RangeIndex optimization only triggers for certain patterns

This explains why:
- Simple test cases often work (hash values differ)
- First operation may succeed but second fails (different hash values)
- Behavior varies between Python sessions (hash randomization)

## Recommended Fixes

### Option 1: Fix in hedtools (Recommended)

**File**: `hed/tools/analysis/key_map.py`, line 149

**Before:**
```python
map_series = pd.Series(self.map_dict)
```

**After:**
```python
# Workaround for pandas 3.0 bug with large integer dict keys
if self.map_dict:
    idx = pd.Index(list(self.map_dict.keys()))
    map_series = pd.Series(list(self.map_dict.values()), index=idx)
else:
    map_series = pd.Series(self.map_dict)
```

### Option 2: Constrain pandas version (Temporary)

In consuming packages (like table-remodeler):

**pyproject.toml:**
```toml
dependencies = [
    "pandas>=2.2.3,<3.0",
]
```

## Impact Assessment

### Affected Code
- **hedtools**: `KeyMap` class in `hed/tools/analysis/key_map.py`
- **table-remodeler**: Any operation using `RemapColumnsOp` with integer sources
- **Potential**: Any hedtools code path that uses KeyMap with hash-based lookups

### Severity
- **High**: Causes complete operation failure
- **Intermittent**: Depends on hash values (session-dependent)
- **Silent**: May work in testing but fail in production

### When It Occurs
- Multiple cascading remap operations
- Large datasets with many unique values
- String columns converted to integer types
- Operations that create intermediate columns used as sources for subsequent operations

## Testing

### Minimal Reproduction Test
```python
import pandas as pd
assert pd.__version__ == '3.0.0', "Test requires pandas 3.0.0"

# This specific dict triggers the bug
problem_dict = {-4186896901282141619: 0, -8311529505453501279: 1}

try:
    series = pd.Series(problem_dict)
    print("UNEXPECTED: Series created successfully - bug may be fixed")
except ValueError as e:
    if "Length of values" in str(e) and "does not match length of index" in str(e):
        print("BUG CONFIRMED: Pandas 3.0 dict-to-Series bug reproduced")
    else:
        print(f"DIFFERENT ERROR: {e}")
```

### Full Integration Test
See: `table-remodeler/.status/test_exact_scenario.py`

## Status

- **Reported**: January 27, 2026
- **Pandas Version Affected**: 3.0.0
- **Pandas Versions Working**: 2.3.3 and earlier
- **hedtools Version**: 0.9.0
- **Workaround Implemented**: pandas version constraint in table-remodeler
- **Permanent Fix Needed**: In hedtools KeyMap class

## References

- hedtools repository: https://github.com/hed-standard/hed-python
- Pandas issue tracker: https://github.com/pandas-dev/pandas/issues
- Related warnings: `RuntimeWarning: overflow encountered in scalar add/subtract` in `pandas/core/indexes/base.py`

## Recommendations for hedtools Maintainers

1. **Immediate**: Apply the workaround in KeyMap._remap() 
2. **Short-term**: Add test coverage for large integer hash dictionaries
3. **Report**: File bug report with pandas team with minimal reproduction case
4. **Version constraint**: Consider adding `pandas<3.0` constraint until pandas fixes the issue or hedtools applies workaround
5. **Documentation**: Add note about pandas 3.0 compatibility in changelog

## Recommendations for table-remodeler

1. **Current**: pandas<3.0 constraint is appropriate workaround ✓
2. **Monitor**: Watch for hedtools updates with pandas 3.0 support
3. **Future**: Remove constraint once hedtools addresses the issue
4. **Documentation**: Note the pandas version requirement in user-facing docs


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] KeyMap fails with large keys and Pandas 3.0.0 #1197

Pandas 3.0.0 Compatibility Issue in hedtools KeyMap

Executive Summary

Detailed Analysis

The Failing Code

The Problem Dict

Pandas 3.0 Bug Details

Reproduction

Why It Doesn't Always Fail

Recommended Fixes

Option 1: Fix in hedtools (Recommended)

Option 2: Constrain pandas version (Temporary)

Impact Assessment

Affected Code

Severity

When It Occurs

Testing

Minimal Reproduction Test

Full Integration Test

Status

References

Recommendations for hedtools Maintainers

Recommendations for table-remodeler

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] KeyMap fails with large keys and Pandas 3.0.0 #1197

Description

Pandas 3.0.0 Compatibility Issue in hedtools KeyMap

Executive Summary

Detailed Analysis

The Failing Code

The Problem Dict

Pandas 3.0 Bug Details

Reproduction

Why It Doesn't Always Fail

Recommended Fixes

Option 1: Fix in hedtools (Recommended)

Option 2: Constrain pandas version (Temporary)

Impact Assessment

Affected Code

Severity

When It Occurs

Testing

Minimal Reproduction Test

Full Integration Test

Status

References

Recommendations for hedtools Maintainers

Recommendations for table-remodeler

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions