Skip to content

Conversation

Copy link

Copilot AI commented Oct 26, 2025

The characterize method fails with a cryptic "Index contains duplicate entries, cannot reshape" error when region/sector columns don't use the exact names "region" or "sector" (lowercase). This occurs because misnamed columns aren't added to the index, creating duplicates during unstack.

Changes

  • Added duplicate index validation in Extension.characterize() before unstacking
  • Implemented smart error detection that identifies:
    • Case mismatches (Region, REGION vs region)
    • Common alternatives (country, reg, location, industry)
  • Added module constants ALTERNATIVE_REGION_NAMES and ALTERNATIVE_SECTOR_NAMES for maintainability

Example

Before:

factors = pd.DataFrame({'Region': [...], 'stressor': [...], ...})  # Note: 'Region' not 'region'
io.emissions.characterize(factors)
# ValueError: Index contains duplicate entries, cannot reshape

After:

io.emissions.characterize(factors)
# ValueError: Duplicate indices found in characterization factors.
# This typically occurs when region or sector specific characterization factors
# are provided but the column names don't match pymrio's expectations.
#
# Expected column names:
#   - 'region' (lowercase) for region-specific factors
#   - 'sector' (lowercase) for sector-specific factors
#
# Current columns: ['Region', 'stressor', ...]
#
# Found possible region column with different case: ['Region']
# Please rename it to 'region' (lowercase).

Testing

Added test_characterize_error_wrong_column_names() covering case mismatches and alternative names. All existing tests pass.

Original prompt

This section details on the original issue you should resolve

<issue_title>Improve error message for characterize</issue_title>
<issue_description>If one wants to do region/sector specific characterization and the region/sector column name do not fit the pymrio specs, the characterize code fails at the very last step with "duplicate index" in unstack.

This should be checked before:
possible solutions

  • make sector/region explicit in the arguments
  • check for duplicate index and give a more meaingful error message</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 18818197488

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 87.243%

Totals Coverage Status
Change from base Build 16274124697: 0.0%
Covered Lines: 2626
Relevant Lines: 3010

💛 - Coveralls

Copilot AI and others added 2 commits October 26, 2025 13:00
Co-authored-by: konstantinstadler <6782923+konstantinstadler@users.noreply.github.com>
Co-authored-by: konstantinstadler <6782923+konstantinstadler@users.noreply.github.com>
Copilot AI changed the title [WIP] Improve error message for characterize when index duplicates Improve error message for characterize when region/sector columns are misnamed Oct 26, 2025
Copilot finished work on behalf of konstantinstadler October 26, 2025 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve error message for characterize

3 participants