Skip to content

[Fix] Merge class lists when concatenating datasets with different types#12437

Open
crawfordxx wants to merge 1 commit intoopen-mmlab:mainfrom
crawfordxx:fix-concat-different-dataset-types
Open

[Fix] Merge class lists when concatenating datasets with different types#12437
crawfordxx wants to merge 1 commit intoopen-mmlab:mainfrom
crawfordxx:fix-concat-different-dataset-types

Conversation

@crawfordxx
Copy link
Copy Markdown

Motivation

Fixes #8890

When concatenating datasets with different class sets (e.g. CocoDataset + VOCDataset), ConcatDataset stored _metainfo as a list of dicts. This broke downstream evaluation code that expects metainfo to be a single dict with a classes key, causing IndexError in _det2json when cat_ids[label] was accessed with out-of-range labels.

Modification

Added a _merge_metainfo() method to ConcatDataset that:

  • Merges class lists from all sub-datasets into a unified ordered set (preserving insertion order, deduplicating shared classes)
  • Merges palette colours from each dataset, with a deterministic fallback for classes without a defined colour
  • Returns a single merged metainfo dict instead of the list fallback, so evaluation works correctly with heterogeneous dataset concatenation

Also updated the full_init metainfo update logic to use isinstance check instead of the removed is_all_same flag.

BC-breaking (Yes/No)

No. When all datasets have the same classes, behaviour is unchanged. When datasets differ, the merged dict is a strict improvement over the previous list-of-dicts fallback which was unusable by downstream code.

Checklist

  • Pre-commit hooks pass (pre-commit run --all-files)
  • Unit test added (tests/test_datasets/test_dataset_wrappers.py)

When concatenating datasets with different class sets (e.g. CocoDataset
+ VOCDataset), ConcatDataset previously stored metainfo as a list of
dicts. This broke downstream evaluation code that expects metainfo to be
a single dict with a 'classes' key, causing IndexError in _det2json
when cat_ids[label] was accessed with out-of-range labels.

This fix adds a _merge_metainfo method that merges class lists from all
sub-datasets into a unified set, preserving palette colours from each
dataset. The merged metainfo dict is used instead of the list fallback,
so evaluation works correctly with heterogeneous dataset concatenation.

Fixes open-mmlab#8890
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


majianhan seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

concat 2 diff types dataset train ,val and test issue

3 participants