Add Diar-az #39

afk0901 · 2024-08-20T02:36:09Z

Diar-az creates files for a (diarization) corpus from Gecko and provides organization, cleaning and correction of data for Kaldi to Gecko to Kaldi/corpus and back.

wq2012 · 2024-08-20T12:52:24Z

I think this should fall into "Other software" instead of "Diarization dataset".

This is not a new dataset. It's just a format conversion tool, is it correct?

judyfong · 2024-08-20T14:41:06Z

Its a tool specifically for the ruv-di dataset

wq2012 · 2024-08-20T14:43:25Z

If so, we should add ruv as a dataset, and this repo as "Other Software".

judyfong · 2024-08-20T17:20:38Z

The dataset was never published, only the resulting models. Also, yes that dataset should be added but it was also lost in a cyber security attack in January 2024 on Reykjavik University’s servers. If you want, you could put a placeholder text for the RÚV-DI dataset here in this repo and we could try to recreate the dataset. We have a license that lists all the shows and episodes contained within the dataset. So we could recreate it from that. Other software works in my opinion.

…

On Tuesday, August 20, 2024, Quan Wang ***@***.***> wrote: If so, we should add ruv as a dataset, and this repo as "Other Software". — Reply to this email directly, view it on GitHub <#39 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABUMNYEZUD2QOJQQ7AE2X5TZSNI2HAVCNFSM6AAAAABMY6MAQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJZGAZTMOJRGA> . You are receiving this because you commented.Message ID: ***@***.***>

afk0901 · 2024-08-20T17:41:44Z

Yes, I think Other software works and maybe a better fit, as it's not really a dataset, rather it was a tool to support the ruv-di dataset. To correct this, should this pull request be just updated or a new one created?

The dataset was never published, only the resulting models. Also, yes that dataset should be added but it was also lost in a cyber security attack in January 2024 on Reykjavik University’s servers. If you want, you could put a placeholder text for the RÚV-DI dataset here in this repo and we could try to recreate the dataset. We have a license that lists all the shows and episodes contained within the dataset. So we could recreate it from that. Other software works in my opinion.
…
On Tuesday, August 20, 2024, Quan Wang @.> wrote: If so, we should add ruv as a dataset, and this repo as "Other Software". — Reply to this email directly, view it on GitHub <#39 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUMNYEZUD2QOJQQ7AE2X5TZSNI2HAVCNFSM6AAAAABMY6MAQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJZGAZTMOJRGA . You are receiving this because you commented.Message ID: @.>

wq2012 · 2024-08-23T13:56:26Z

To correct this, should this pull request be just updated or a new one created?

I'm OK either way.

afk0901 · 2024-08-24T03:43:10Z

To correct this, should this pull request be just updated or a new one created?

I'm OK either way.

Fixed, added to other software.

judyfong · 2024-09-20T08:16:40Z

@afk0901 i believe you also need to put the placeholder text for the dataset for this pr to be properly closed.

In terms of recreating the dataset i believe it's actually best if @wq2012 recreates the dataset with daan and pet of google. And @afk0901 finish our writeup of this dataset creation. When we are both done we compare notes on arxiv and write the dataset paper together for interspeech, icassp, or sand2025, or wand in october.

judyfong · 2024-09-20T08:20:11Z

For continuity and clarity I believe it's best if my second paragraph is dealt with separately, not in this pr. Thus i have created a new issue for it within this repo.

wq2012 · 2024-09-20T13:28:45Z

To correct this, should this pull request be just updated or a new one created?

I'm OK either way.

Fixed, added to other software.

I didn't see the change.

wq2012 · 2024-09-20T19:35:06Z

README.md

@@ -295,6 +296,7 @@ Team in the Inaugural DIHARD Challenge](https://www.isca-speech.org/archive/pdfs
 | [VoxConverse](https://github.com/joonson/voxconverse) | TBD | TBD | Free | VoxConverse is an audio-visual diarisation dataset consisting of over 50 hours of multispeaker clips of human speech, extracted from YouTube videos |
 | [MiniVox Benchmark](https://github.com/doerlbh/MiniVox) | [MiniVox Benchmark](https://github.com/doerlbh/MiniVox) | en | Free | MiniVox is an automatic framework to transform any speaker-labelled dataset into continuous speech datastream with episodically revealed label feedbacks. |
 | [The AliMeeting Corpus](https://github.com/yufan-aslp/AliMeeting) | Together with audios | zh | Free |  |
+| RÚV-DI dataset | TBD | is | TBD | | 


please remove this

Add Diar-az

afk0901 mentioned this pull request Aug 20, 2024

Adding to awesome-diarization repo cadia-lvl/diar-az#4

Closed

afk0901 force-pushed the master branch from 21e1f7c to ff5894c Compare August 20, 2024 03:46

afk0901 force-pushed the master branch from dae286d to 0c72232 Compare August 24, 2024 03:40

afk0901 force-pushed the master branch 2 times, most recently from f32caa9 to 8d1a453 Compare September 20, 2024 19:29

wq2012 reviewed Sep 20, 2024

View reviewed changes

Add Diar-az

ab2efa4

Add Diar-az

afk0901 force-pushed the master branch from 8d1a453 to ab2efa4 Compare September 20, 2024 19:37

wq2012 merged commit 6569a17 into wq2012:master Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Diar-az #39

Add Diar-az #39

afk0901 commented Aug 20, 2024

wq2012 commented Aug 20, 2024

judyfong commented Aug 20, 2024

wq2012 commented Aug 20, 2024

judyfong commented Aug 20, 2024 via email

afk0901 commented Aug 20, 2024 •

edited

Loading

wq2012 commented Aug 23, 2024

afk0901 commented Aug 24, 2024

judyfong commented Sep 20, 2024

judyfong commented Sep 20, 2024

wq2012 commented Sep 20, 2024

wq2012 Sep 20, 2024

afk0901 Sep 20, 2024

Add Diar-az #39

Add Diar-az #39

Conversation

afk0901 commented Aug 20, 2024

wq2012 commented Aug 20, 2024

judyfong commented Aug 20, 2024

wq2012 commented Aug 20, 2024

judyfong commented Aug 20, 2024 via email

afk0901 commented Aug 20, 2024 • edited Loading

wq2012 commented Aug 23, 2024

afk0901 commented Aug 24, 2024

judyfong commented Sep 20, 2024

judyfong commented Sep 20, 2024

wq2012 commented Sep 20, 2024

wq2012 Sep 20, 2024

Choose a reason for hiding this comment

afk0901 Sep 20, 2024

Choose a reason for hiding this comment

afk0901 commented Aug 20, 2024 •

edited

Loading