pixtral SFT #296

shruthan · 2025-06-11T17:46:41Z

✨ Description

Some bug fixes for Image+Text SFTs

🔍 Type of change

Select all that apply:

🐛 Bug fix (non-breaking change that addresses a specific issue)
🚀 New feature (non-breaking change that adds functionality)
⚠️ Breaking change (a change that could affect existing functionality)
📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
📝 Documentation change (updates documentation, including new content or typo fixes)
🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

📝 Changes

List the key changes introduced in this PR:

Change A
Change B

✅ Checklist

Make sure the following tasks are completed before submitting the PR:

General

📜 I have read and followed the contributing guidelines.
🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced.
🎉 The functionality is complete, and I have tested the changes.
📝 I have updated the documentation if needed.
⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases).
🧩 I have commented my code, especially in hard-to-understand areas.

Dependencies and Configuration

🐋 I have updated the Docker configuration or dependencies, if applicable.
🔄 I have ensured compatibility with the existing setup after dependency changes.

Testing

🧪 I have added or updated tests to cover my changes.
✔️ New and existing tests pass locally with my changes.
🚦 I have tested these changes on GPUs and verified training stability.
🏋️ I have tested the changes on realistic training workloads, if applicable.

Performance Impact

📊 I have run benchmarks where applicable to evaluate the performance impact.
✅ The benchmarks show no performance regression.
🚀 The benchmarks indicate a potential performance improvement.
⚠️ The benchmarks indicate a potential performance degradation.
📈 I have provided benchmark results and detailed any performance impact below, if applicable.

📊 Performance Impact Details

If there is any impact on performance, describe it and provide benchmark results, if applicable:

🗒️ Additional Notes

Include any additional context, information, or considerations here, such as known issues, follow-up tasks, or backward compatibility concerns.

sohamparikh · 2025-06-11T17:48:58Z

fast_llm/data/dataset/gpt/memmap.py

+                #  not sure of assignment, reading flag to indicate whether preference loss-masking spans are present
+                self._has_preference_spans = struct.unpack("<B", stream.read(1))[0]


it's already read above, why read it here?

There's another flag written after images, this is to read that, but not sure what the assignment should be

Fast-LLM/fast_llm/data/dataset/gpt/memmap.py

Lines 407 to 412 in 2f85615

# Placeholder flag for preference spans

idx_stream.write(struct.pack("<B", 0))

# Flag to indicate whether images are present

idx_stream.write(struct.pack("<B", 1 if total_images > 0 else 0))

# Flag to indicate whether preference loss-masking spans are present

idx_stream.write(struct.pack("<B", 1 if chosen_spans.size > 0 and rejected_spans.size > 0 else 0))

oh yeah that order should be flipped, the chosen_spans byte should be before total_images, i'll fix it in my branch

This would break files with version==3 right?

It seems to me that we should rather fix the order in which those flags are dumped in the idx file below

@RaymondLi0 yes, I'm planning to fix it in #227

sohamparikh · 2025-06-11T17:52:24Z

fast_llm/data/dataset/gpt/memmap.py

+            total_pixels_needed = sum(
+                length[0] * length[1] * 3 for length in self._image_lengths[idx]
+            )


Suggested change

total_pixels_needed = sum(

length[0] * length[1] * 3 for length in self._image_lengths[idx]

)

total_pixels_needed = self._image_lengths[idx].prod(initial=3, axis=1).sum()

sohamparikh · 2025-06-11T17:52:51Z

fast_llm/data/dataset/gpt/memmap.py

                offset=self._pointers[idx] + self._document_sizes[idx] * np.dtype(self._dtype).itemsize,
            )
            images = []
            start = 0
            for image_length in self._image_lengths[idx]:
-                n_pixels = image_length.prod(initial=3)
+                n_pixels = image_length[0] * image_length[1] * 3


can leave it as using .prod?

sohamparikh · 2025-06-11T17:59:16Z

fast_llm/data/dataset/gpt/sampled.py

@@ -549,7 +549,7 @@ def __getitem__(self, index: int) -> typing.Any:
                    use_loss_masking_spans=self._parameters.use_loss_masking_spans,
                )
                start_pos = 0
-                if sample.image_positions:
+                if sample.image_positions is not None:


use a bool has_images = bool(sample.image_positions) and use it below as well?

I think bool(sample.image_positions) will throw ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() when there are more than one image_positions: bool(np.array([2, 3]))

or simply has_images = True if sample.image_positions else False?

if sample.image_positions would throw the same error, changed it to has_image_positions = sample.image_positions is not None

sohamparikh

LGTM! Thanks!

fast_llm/data/dataset/gpt/sampled.py

shruthan added 4 commits June 8, 2025 21:31

add fixes for mm sft

4382dab

fix comment

810e7d9

simpler approach for im_positions

1875df1

fix for text only samples

2f85615

shruthan requested a review from sohamparikh June 11, 2025 17:46

sohamparikh reviewed Jun 11, 2025

View reviewed changes

review comments

2fb5125

sohamparikh approved these changes Jun 11, 2025

View reviewed changes

fast_llm/data/dataset/gpt/sampled.py Outdated Show resolved Hide resolved

fast_llm/data/dataset/gpt/sampled.py Outdated Show resolved Hide resolved

fast_llm/data/dataset/gpt/sampled.py Outdated Show resolved Hide resolved

fast_llm/data/dataset/gpt/sampled.py Outdated Show resolved Hide resolved

Apply suggestions from code review

e1505bc

sohamparikh merged commit 275fefa into soham/pixtral-support Jun 11, 2025

sohamparikh deleted the shruthan/pixtral-sft branch June 11, 2025 19:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pixtral SFT #296

pixtral SFT #296

Uh oh!

shruthan commented Jun 11, 2025

Uh oh!

sohamparikh Jun 11, 2025

Uh oh!

shruthan Jun 11, 2025

Uh oh!

sohamparikh Jun 11, 2025 •

edited

Loading

Uh oh!

RaymondLi0 Jun 11, 2025

Uh oh!

sohamparikh Jun 11, 2025

Uh oh!

sohamparikh Jun 11, 2025

Uh oh!

sohamparikh Jun 11, 2025

Uh oh!

sohamparikh Jun 11, 2025

Uh oh!

shruthan Jun 11, 2025 •

edited

Loading

Uh oh!

sohamparikh Jun 11, 2025

Uh oh!

shruthan Jun 11, 2025

Uh oh!

sohamparikh left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		# not sure of assignment, reading flag to indicate whether preference loss-masking spans are present
		self._has_preference_spans = struct.unpack("<B", stream.read(1))[0]

	# Placeholder flag for preference spans
	idx_stream.write(struct.pack("<B", 0))
	# Flag to indicate whether images are present
	idx_stream.write(struct.pack("<B", 1 if total_images > 0 else 0))
	# Flag to indicate whether preference loss-masking spans are present
	idx_stream.write(struct.pack("<B", 1 if chosen_spans.size > 0 and rejected_spans.size > 0 else 0))

pixtral SFT #296

pixtral SFT #296

Uh oh!

Conversation

shruthan commented Jun 11, 2025

✨ Description

🔍 Type of change

📝 Changes

✅ Checklist

General

Dependencies and Configuration

Testing

Performance Impact

📊 Performance Impact Details

🗒️ Additional Notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sohamparikh Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shruthan Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sohamparikh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sohamparikh Jun 11, 2025 •

edited

Loading

shruthan Jun 11, 2025 •

edited

Loading