About filtering duplicate relations for VG dataset

It seems to me the following code snippet doesn't work as expected:

https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch/blob/d0ffa40d92133d7d865e531146de82c8c8a344c0/maskrcnn_benchmark/data/datasets/visual_genome.py#L148-L156

I was thinking filtering out duplicate relations means for those exactly repeated relation triplets (i.e., not only subject and object  are the same but also the predicate); however, this snippet seems to preserve only a single predicate for each object pair (with a higher chance for those occurring more times to be chosen). This seems unreasonable for me and makes the following snippet redundant:

https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch/blob/d0ffa40d92133d7d865e531146de82c8c8a344c0/maskrcnn_benchmark/data/datasets/visual_genome.py#L162-L164

To accommodate multiple labels for each object pair, I think we have to change L148-L156 to the following:
```
if self.filter_duplicate_rels:
    # Filter out dupes!
    assert self.split == 'train'
    old_size = relation.shape[0]
    all_rel_sets = defaultdict(set)
    for (o0, o1, r) in relation:
        all_rel_sets[(o0, o1)].add(r)
    relation = [(k[0], k[1], v) for k, vs in all_rel_sets.items() for v in vs]
    relation = np.array(relation, dtype=np.int32)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

About filtering duplicate relations for VG dataset #129

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	if self.filter_duplicate_rels:
	# Filter out dupes!
	assert self.split == 'train'
	old_size = relation.shape[0]
	all_rel_sets = defaultdict(list)
	for (o0, o1, r) in relation:
	all_rel_sets[(o0, o1)].append(r)
	relation = [(k[0], k[1], np.random.choice(v)) for k,v in all_rel_sets.items()]
	relation = np.array(relation, dtype=np.int32)

	if relation_map[int(relation[i,0]), int(relation[i,1])] > 0:
	if (random.random() > 0.5):
	relation_map[int(relation[i,0]), int(relation[i,1])] = int(relation[i,2])

Uh oh!

About filtering duplicate relations for VG dataset #129

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions