Updates in `VisionLanguageCollator` and in `coco_captions` #563

nikg4 · 2024-09-27T21:12:36Z

-- Update coco_captions to load image bytes from example["image"]["bytes"] if available (i.e., prefer to use IMAGE_BINARY over IMAGE_PATH when possible)
-- Raise more descriptive error messages if one of expected keys is missing in examples /items respectively
-- Add minimal_multimodal_training.py to VSCode launch config
-- Misc minor clean-ups

Tested with local coco_captions dataset.

Towards OPE-353
Fixes OPE-354

…gx-llava-gcp

… into xrdaukar/gx-llava-gcp

linear · 2024-09-27T21:16:26Z

linear · 2024-09-27T23:16:15Z

OPE-354 [bug] Make sure llava, blip-2 (and others) work with simple vision language datasets

For now, they work with a dummy template

src/oumi/datasets/vision_language/coco_captions.py

… into xrdaukar/gx-llava-gcp

oelachqar · 2024-10-01T05:11:39Z

src/oumi/builders/collators.py

+        images = []
+        text_inputs = []
+        for item in batch:
+            for required_key in (_PIXEL_VALUES_KEY, _INPUT_IDS_KEY):


I'm not sure if these are necessarily always required -- a vision/language model can handle text only inputs (e.g. a follow-up to an answer) and image only inputs (e.g. captioning)

Added TODO to reconsider this . Note that this PR just raises a better error message, it doesn't change validation condition

nikg4 added 20 commits September 24, 2024 12:51

SkyPilot command

f61c2d0

save

d28c0c4

Merge branch 'main' of https://github.com/oumi-ai/oumi into xrdaukar/…

279b595

…gx-llava-gcp

Merge branch 'main' of https://github.com/oumi-ai/oumi into xrdaukar/…

c96fef4

…gx-llava-gcp

split

d765bfb

save

09258c0

Merge branch 'xrdaukar/gx-llava-gcp' of https://github.com/oumi-ai/oumi…

227c450

… into xrdaukar/gx-llava-gcp

a

45ec93d

merge

9b714b8

test

1f900c3

hf_dataset

a472850

load_from_disk

48fdf5d

Merge https://github.com/oumi-ai/oumi into xrdaukar/gx-llava-gcp

381f627

save

652aa0f

save

eafb17b

merge

adb74c4

merge

8644775

merge

c64fe16

clean

1b015f6

clean

ad3d69b

nikg4 changed the title ~~Updates in coco_captions dataset and in VisionLanguageCollator~~ Updates in VisionLanguageCollator and in coco_captions Sep 27, 2024

nikg4 added 3 commits September 27, 2024 15:27

clean

9bb2c01

_get_test_png_image_bytes

be14a22

_get_test_png_image_bytes

013f8e7

nikg4 marked this pull request as ready for review September 27, 2024 22:31

nikg4 requested review from optas, oelachqar, taenin and wizeng23 September 27, 2024 22:32

taenin reviewed Sep 30, 2024

View reviewed changes

src/oumi/datasets/vision_language/coco_captions.py Outdated Show resolved Hide resolved

taenin approved these changes Sep 30, 2024

View reviewed changes

nikg4 added 3 commits September 30, 2024 10:42

merge

765cb77

Merge branch 'xrdaukar/gx-llava-gcp' of https://github.com/oumi-ai/oumi…

cbdf35e

… into xrdaukar/gx-llava-gcp

PR feedback

05833ad

oelachqar approved these changes Oct 1, 2024

View reviewed changes

PR feedback

6f2efb6

nikg4 merged commit 4cc82d6 into main Oct 1, 2024
1 check passed

nikg4 deleted the xrdaukar/gx-llava-gcp branch October 1, 2024 16:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates in `VisionLanguageCollator` and in `coco_captions` #563

Updates in `VisionLanguageCollator` and in `coco_captions` #563

nikg4 commented Sep 27, 2024 •

edited

Loading

linear bot commented Sep 27, 2024

linear bot commented Sep 27, 2024

oelachqar Oct 1, 2024

nikg4 Oct 1, 2024

Updates in VisionLanguageCollator and in coco_captions #563

Updates in VisionLanguageCollator and in coco_captions #563

Conversation

nikg4 commented Sep 27, 2024 • edited Loading

linear bot commented Sep 27, 2024

linear bot commented Sep 27, 2024

oelachqar Oct 1, 2024

Choose a reason for hiding this comment

nikg4 Oct 1, 2024

Choose a reason for hiding this comment

Updates in `VisionLanguageCollator` and in `coco_captions` #563

Updates in `VisionLanguageCollator` and in `coco_captions` #563

nikg4 commented Sep 27, 2024 •

edited

Loading