SODA Dataset for Training #35

farzadab · 2024-06-21T16:42:14Z

No description provided.

ultravox/tools/infer_tool.py

ultravox/tools/infer_api.py

ultravox/evaluation/eval_types.py

ultravox/evaluation/gpt_eval.py

farzadab · 2024-06-25T21:25:35Z

The PR is ready.

ultravox/evaluation/gpt_eval_boolq.py

juberti · 2024-07-06T00:08:20Z

ultravox/data/datasets.py

+        roles = ["user", "assistant"] if len(turns) % 2 == 0 else ["assistant", "user"]
+
+        num_prompts = min(self._args.num_prompts, len(self.SYS_PROMPTS))
+        sys_prompt = self.SYS_PROMPTS[idx % num_prompts]


did we end up using a RNG for this sort of thing rather than the index?

(I forget where but we discussed adding a private RNG to datasets to allow them to simply pull a value from the RNG rather than using the index counter and various moduli)

Yes, the idea was that we do that in the next PR.

Might as well just do it now I guess since I have the code.

farzadab · 2024-07-08T16:44:54Z

ultravox/data/datasets.py

+            for column_name in self.BASE_AUDIO_COLUMNS:
+                dataset = dataset.cast_column(
+                    column_name, datasets.Audio(sampling_rate=SAMPLE_RATE)
+                )


Bugfix for datasets that have audio column that is not named audio.

This was not an issue for SODA since it was constructed with 16K Hz, but it was sloppy of me.

farzadab added 8 commits June 19, 2024 11:59

soda dataset for training

f2b0d6d

soda: alt_last_turn + convo eval

ee93d20

Merge remote-tracking branch 'origin/main' into farzad-soda-train

ea2e9c6

bugfix: 6 -> 64 max_new_tokens for soda

be8083e

add soda to stage2 configs

aa146e7

fix formatting

b0e83cc

change prompt so truncation won't affect evaluation

c209dda

soda prompt fix

06b45ff

juberti reviewed Jun 21, 2024

View reviewed changes

ultravox/tools/infer_tool.py Outdated Show resolved Hide resolved

ultravox/tools/infer_api.py Outdated Show resolved Hide resolved

ultravox/evaluation/eval_types.py Outdated Show resolved Hide resolved

ultravox/evaluation/gpt_eval.py Outdated Show resolved Hide resolved

farzadab added 5 commits June 24, 2024 12:32

fix soda text-only

d9f8c23

fix t_end not defined

3c8cbd7

rename audio_one_but_last -> audio_second_last_turn

59a5742

allowing multiple messages in inference

20c27cb

eval Sample history: list of str to list of dict

87aaf19

farzadab force-pushed the farzad-soda-train branch 2 times, most recently from 6248f64 to 5ba015c Compare June 25, 2024 21:23

farzadab marked this pull request as ready for review June 25, 2024 21:25

separate gpt_evals + test for conv eval

e4ec48a

farzadab force-pushed the farzad-soda-train branch from 5ba015c to e4ec48a Compare June 25, 2024 21:29

juberti approved these changes Jul 6, 2024

View reviewed changes

farzadab added 3 commits July 8, 2024 09:23

make evaluate_answer_gpt public

075760d

dataset sample prompt: % idx -> RNG

31c2207

add check to make sure audio column is resampled

99c1409

farzadab commented Jul 8, 2024

View reviewed changes

farzadab merged commit e607220 into main Jul 8, 2024
1 check passed

farzadab deleted the farzad-soda-train branch July 8, 2024 18:16

cdiddy77 mentioned this pull request Jul 24, 2024

Make so infer_tools works with a single arg for filename #55

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SODA Dataset for Training #35

SODA Dataset for Training #35

farzadab commented Jun 21, 2024

farzadab commented Jun 25, 2024

juberti Jul 6, 2024

juberti Jul 6, 2024

farzadab Jul 8, 2024

farzadab Jul 8, 2024

farzadab Jul 8, 2024

farzadab Jul 8, 2024

farzadab Jul 8, 2024

SODA Dataset for Training #35

SODA Dataset for Training #35

Conversation

farzadab commented Jun 21, 2024

farzadab commented Jun 25, 2024

juberti Jul 6, 2024

Choose a reason for hiding this comment

juberti Jul 6, 2024

Choose a reason for hiding this comment

farzadab Jul 8, 2024

Choose a reason for hiding this comment

farzadab Jul 8, 2024

Choose a reason for hiding this comment

farzadab Jul 8, 2024

Choose a reason for hiding this comment

farzadab Jul 8, 2024

Choose a reason for hiding this comment

farzadab Jul 8, 2024

Choose a reason for hiding this comment