test and val dataset included in train in speech_command_classification_with_torchaudio_tutorial.py

In [speech_command_classification_with_torchaudio_tutorial.py](https://github.com/pytorch/tutorials/blob/master/intermediate_source/speech_command_classification_with_torchaudio_tutorial.py), the test and validation datasets are included in the training dataset.

When doing:
https://github.com/pytorch/tutorials/blob/bb3523fe06116c6671319d476f5d2725d0406c58/intermediate_source/speech_command_classification_with_torchaudio_tutorial.py#L95-L96

And checking for intersection of speakers:
```python
train_speaker_ids = set([ s[3] for s in train_set])
test_speaker_ids = set([ s[3] for s in test_set])
intersection_of_speakers = train_speaker_ids & test_speaker_ids
print(len(train_speaker_ids))
print(len(test_speaker_ids))
print(len(intersection))
```
It gives

```
2618
250
250
```
Hence the 250 speakers in test are in the training dataset. Same for thing for validation set.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test and val dataset included in train in speech_command_classification_with_torchaudio_tutorial.py #1830

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	train_set = SubsetSC("training")
	test_set = SubsetSC("testing")

test and val dataset included in train in speech_command_classification_with_torchaudio_tutorial.py #1830

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions