Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlapping items in development / evaluation / validation sets #3

Open
keunwoochoi opened this issue Aug 17, 2022 · 0 comments
Open

Comments

@keunwoochoi
Copy link

Hi, thanks for the great work!
I'm new to the dataset; and I found out there are overlapping entries between split sets as below.

Between valid and eval sets:

  • sound_id 66304. In valid set, the segment is [1607168, 2436248]. In eval, it's [1597440, 2865471].
    • file name: 01 A pug struggles to breathe 1_14_2008.wav
  • sound_id 86161. [263168, 1253213] vs [173056, 1249146].
    • file name: Bus(Drive_Reverse)_1-2.wav

Between valid and dev sets:

  • sound_id 86163. Its whole segment was used in both.
    • file name: City Ambience w_ Car Passing_1-2.wav
  • sound_id 130603. [0, 822465] vs [11264, 768067].
    • Their file names vary. (Greek Chat2 - (Apollonia__39_s sPA) 18_44 05.10.wav // Greek Chat2 - (Apollonia's sPA) 18_44 05.10.wav)

Between eval and dev sets:

  • soind_id 137692. [635904, 1430834] vs [141824, 1370009].
    • FREEZER_DOOR_OPEN_CLOSE.wav

I'm now wondering if this is a known issue to the (DCASE?) community / researchers. If so, do you know what is the recommended way to handle this? Otherwise, perhaps it's something that can be updated in Clotho 2.2 or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant